As quants, we’re all aware that every model has a shelf life. Sooner or later, the ideas and techniques behind every “proprietary” analytical technique diffuse into the broader world, at which point that technique is no longer the source of a competitive edge or alpha. What’s less well appreciated is that a similar pattern applies to the world of data. Rare, unique and proprietary data eventually diffuses and becomes commonplace, easily available, edgeless data. The best analysts constantly reinvent their models and source new data products to avoid their inevitable obsolescence. Today, they’re venturing into the world of alternative data as a new source of alpha.
Information is power. This has always been the case in the market: A trader with unique and important data has an advantage over one without. “Informational edge” is one of the cleanest, simplest, most obvious ways to make money.
Two hundred years ago, geopolitical events suddenly became “edge-worthy” data. In an era before telegrams and news services, knowing the results of battles, elections and campaigns before anybody else was a huge advantage. Today, of course, there is no piece of news that is not disseminated worldwide almost instantly.
One hundred years ago, stock price data fell into this category. This was the era of ticker-readers and bucket shops; Jesse Livermore and J. P. Morgan. Many of the classic price-dependent trading strategies, including all the technical patterns beloved of modern chartists, were developed during this time. Today, of course, every analyst has free or cheap access to high-quality stock price data.
Fifty years ago, corporate financials fell into this category. Until Reuters corporation first digitized company statements in the 1970s, this data was difficult to find and hard to use; the best analysts used them anyway. It’s no coincidence that Benjamin Graham and Warren Buffett cut their teeth in this era; it was the era of fundamentals-based investing. Today, of course, every analyst has free or cheap access to high-quality stock fundamentals data.
Five years ago, social sentiment data fell into this category. Thanks to the internet, for the first time ever investors could tap into the true wisdom of crowds and gauge first-hand the degree of fear and greed, hope and despair, permeating the market. In the early 2010s, only the biggest banks and hedge funds could afford to access the Twitter firehose and similar sources of sentiment data. Today, of course, every analyst has free or cheap access to high-quality sentiment data.
The truth is the landscape of data is constantly changing and analysts have to evolve to keep up with it. Data that was unique and rare even five years ago is now commonplace. If anything, the pace of change is accelerating.
A Data Revolution
We’re in the middle of a data revolution. Business processes everywhere are becoming digitized. Firms like Walmart and Target know exactly what you search for and what you end up buying. Other firms like ADP, Mastercard and FedEx are intimately involved in payrolls, transactions, delivery and every other stage of the commercial pipeline. And every single action that these firms take is recorded and stored for analysis.
Human interactions are also becoming digitized. Social networks, instant messaging and web search paint a dynamic, real-time picture of what people are interested in and who they’re talking to. Again, every single action is recorded and stored for posterity.
Smartphones are ubiquitous. This means an accurate location sensor, audio recorder, still/video camera, radio transponder and internet connection in every pocket. Almost no part of the world is outside the limits of cellular coverage.
Cars and trucks now have embedded sensors, tracking position, velocity, traffic and much more. Satellites and GPS have gone from the preserve of the few (military) to the plaything of the many; imagery and position data are today a public good.
As a result of these technological innovations, we are swimming in a sea of data. Yet this data would be meaningless if it weren’t for another, parallel advancement in the area of computation. Thanks to the relentless progress of Moore’s Law, we have the bandwidth to capture all this data, the memory to store it and the cycles to analyze it and extract commercial value from it. This capacity has transformed industries everywhere.
Today, every company is a data company. Firms produce huge quantities of data and they consume huge quantities of data in the pursuit of profit. One way or another, companies and analysts are seeing – in real time! – the same data that will one day become the content of 10K filings or economic releases or financial news. The finance industry is calling this phenomenon “alternative data.”
As a quant, this is both an opportunity and a threat. The opportunity is to unlock new alpha from all these new data sources. The potential here is enormous: Every single industry is going to be transformed by data and early access to that data means an inside view on all those industries and companies.
The threat is that others may beat you to the punch. If you don’t use these new data sources, you will be trading against people who do use them and who have more information than you. That’s a sucker’s game.
Every Company is a Data Company
The Baltic Dry Index, the ADP Payrolls Survey, the ISM Manufacturing Index and the NAHB Housing Survey are all examples of datasets that are not “traditionally” financial in nature (coming as they do from a shipping insurer, a payroll provider, a supply chain group and a constructors association, respectively). Nonetheless, they move markets. In recent years, many such alternative datasets have come to the fore, created by companies in every sector of the economy.
The volume of new car sales is a key driver of performance for auto manufacturers. As you can imagine, this information is tightly guarded by manufacturers. The number of insurance policies that are issued for new cars, however, almost perfectly correlates with new car sales volume. This means that auto insurance data can provide investors a daily count of new car insurance policies, sliced by manufacturer. This data can then be used to estimate total daily new car sales per manufacturer. Anyone trading auto equities would be interested in this kind of data.
Oil rig movements tell important stories about various parts of the oil and gas ecosystem. From inferring the success or failure of an exploration effort to predicting drilling company revenue, there are numerous uses for this kind of data. The importance of shipping and logistics in transporting a commodity across countries cannot be overstated. In the past, investors of such assets relied on monthly reports for production volume and demand, today they can infer commodity transportation from automatic identification system (AIS) data mapped onto port data.
Business Health Metrics
Using inter-company payment patterns — payment amounts, delays and delinquencies — investors can discern whether a company is paying its debts on time. Consistent late payments are often evidence of a company’s strength and the ability to squeeze suppliers. Sudden increases in late payments, however, can be a sign of weakness and cash-flow problems. This data, therefore, can help gauge debtor distress — a leading indicator of stock underperformance.
Alternative Data Sources
We’ve barely scratched the surface of potential alternative data sources. The data sources that will become available in the next decade dwarf what we have access to right now.
The Internet of Things (IoT) will incorporate sensors that are orders of magnitude more granular and more broadly dispersed than our current smartphone network. Drones and microsatellites will provide real-time physical and logistical data far more detailed than our current coverage. Beacons and embedded chips inside products will allow instantaneous knowledge of transactions, usage, obsolescence and more.
And companies will evolve to make use of all this data. Today, for every Amazon or Target that runs a rigorous program of data collection, analysis and action, there are dozens of companies that still operate by old-school rules. As time goes by, these firms will either adapt or become extinct. Either way, the future belongs to businesses that embrace the data revolution. This means that the amount of business data available to analysts will only continue to grow.
But as new data becomes available, “old” data will become commonplace. This is inevitable and analysts must prepare for this eventuality.
The Spectrum of Diffusion for Alternative Data
Data has a natural life cycle.
- Newly discovered datasets are rare, jealously guarded and valuable because they contain alpha.
- But as time goes by, this data diffuses to a wider audience and its alpha content diminishes. Nonetheless, analysts continue to use the data because not doing so would leave them at an information disadvantage.
- Finally, the data becomes old and obsolete: fully priced in by a market that has moved on to the next source of edge.
Smart analysts who have internalized the spectrum of diffusion are aware of this dynamic and are always ready to adapt.
|Looking for this?||Try here:|
|Stock History (US)||End-of-Day Stock Prices|
|Fundamentals (US)||Core US Fundamentals|
|Fundamentals (Europe)||Global Fundamentals|
|Stock Options||Option Volatility Surfaces|
|Intraday (US)||S&P, Nasdaq, DJIA Intraday Prices|
|Looking for this?||Try here:|
|Sentiment Data||FinSentS Web News Sentiment|
|Satellite Imagery Analysis||Ursa Space Oil Tank Measurement|
|Economic Data||FX Volume Data from CLS|
|Transportation||North American Commodities Transport|
|Looking for this?||Try here:|
|Maritime Data||Quandl’s alternative data products now provide insights into many of these industries and more.|
|Internet of Things|
In evaluating whether to invest in rare data, some questions to ask are: Do the people who are trading against me have this information? Am I willing to risk the possibility that they are? You don’t want people with better information trading against you. In 2015, investors were surprised at JC Penney’s Q2 results. The hedge funds that paid RS Metrics, however, were not. RS Metrics uses satellite images to measure the amount of traffic into the stores. It reported, in near real time, that volume was rising in April and May. The company’s clients then traded on this information. In mid-August, JCPenney’s shares jumped more than 10%.
It’s a cautionary tale, or an opportunity, depending on how you look at it. Eventually, everybody will have access to satellite data, just like everybody has access to fundamentals data today, and it will no longer be edge-worthy.
But for now, unorthodox alternative data sources like logistics and consumer data are moving the markets. Those who can access the information the fastest will stay ahead of the market.