One of the many interesting things about alternative data is how difficult it is to define. Most people define it by what it is not: “any non-market data is alternative data”. But very few are actually able to pinpoint what it actually is.
The reason, I suspect, is that alternative data is a moving target. It comes from different sources, takes different forms, and offers insights on different verticals. For example, satellite imagery can predict oil inventories, online job listings can track company growth, and insurance records can track auto sales: that’s three different data types, from three different sources, about three different subjects.
The very features that give alternative data its value for financial markets — its uniqueness and its obscurity — don’t help the matter. It’s no wonder there is so much misinformation floating around trading floors, newsrooms and corner offices.
As Chief Data Officer at Quandl, I evaluate many, many, datasets, and help our clients find value in them. In the process, I have also come across many preconceived notions and myths about alternative data. Here are the five I see most frequently.
1. Alternative data is illegal and/or unethical.
Two legal concerns are inevitably raised during discussions of alternative data: insider trading and abuse of personal information.
Insider trading concerns center around whether investors are using illicit (“material non-public”) information to gain an edge in the market. Meanwhile, personal information is data that can explicitly identify an individual, thus violating their privacy.
Both concerns are misplaced.
“Boots on the ground” have long been part of the financial research process. In the past, investors would send analysts to the shopping mall on the weekend to count the cars in the parking lot. Today, we do that with satellite imagery and pattern recognition software. In the past, traders would read newspapers, follow price action, and use gut instinct to gauge the mood of the market. Today, we do that using social media streams and natural language processing.
Alternative datasets are thus merely the technological descendants of methods known and used in the investment community for decades. They’re the “boots on the ground” of the digital age. And just like anybody can go to a shopping mall or read a newspaper, anybody can use the methods of alternative data. There’s nothing “insider” about it.
As for privacy and personally identifiable information (PII), the good news is that Wall Street simply doesn’t want this information. Wall Street doesn’t care that John Smith from Cleveland bought a lawnmower last month. That kind of granular detail is inconsequential to investors. Instead, Wall Street cares about lawnmower sales throughout the Midwest. Aggregate trends are what matter, not individual consumers.
Indeed, most investors are paranoid about receiving data about individual consumers, even accidentally; they don’t want to take any avoidable legal, regulatory or compliance risk. And the same goes for data vendors. Both vendors and consumers of alternative data have a strong incentive to eliminate PII from the data and are willing to invest resources in that effort. (This is in stark contrast to the advertising and marketing technology worlds, where tracking individual users in the service of ever-more intrusive “personalization” seems to be an inexorable trend).
2. Alternative data is new.
Alternative data is often considered a “new” phenomenon for the markets, but the reality is that it has always been around. Indeed, many of the datasets we consider commonplace today were once obscure, esoteric and “alternative”.
One of my favorite books about the market is Reminiscences of a Stock Operator, the fictionalized biography of a speculator named Jesse Livermore. Livermore made and lost several fortunes over the course of his career. He shorted stocks during the panic of 1907; he was long during the bull market of World War 1; and he shorted stocks again during the crash of 1929, making a billion dollars in today’s currency. In between, he lost it all, multiple times.
Livermore used a number of techniques that would not be unfamiliar to traders today. He used technical factors: support, resistance, trends, reversals. He had an intimate knowledge of supply, demand and market positioning. He even knew, intuitively, about fat tails, and used deterministic scaling rules to take advantage of them.
What’s even more interesting is looking at the data that Livermore did not use. Conspicuously absent from his trading style is any mention of financial statements, balance sheets, cash flow analysis, and value investing.
And of course, that’s understandable. Graham and Dodd published Security Analysis only in 1934. It took another 20 years before accountants started codifying GAAP standards for the first time. And it took another 20 years before that information became widely available to the market via a standardized terminal.
What am I getting at here? I’m suggesting that fundamentals data — what most people would consider the absolute bedrock of equity analysis — used to be outside the mainstream. It used to be the province of a couple of obscure academics, and then a bunch of geeky accountants, and some idiosyncratic investors, for literally decades. It was expensive, hard to understand, hard to access, and hard to use.
In short, it was alternative data.
We see this pattern again and again, in what we’ve called the cycle of diffusion. Datasets that are new and unusual, if they truly have value for the market, eventually become more and more widespread, until everyone’s using them. The alternative data of today is simply the table stakes of tomorrow.
3. Alternative data is only valuable for alpha, and alpha decay makes it self-defeating.
There’s a widespread fantasy of finding a dataset that draws a straight line to alpha… just follow the data and make money. I’ve dreamed of it myself.
Such datasets do exist but they are almost vanishingly rare. And that’s only natural. Markets are very efficient. The idea that you can find a single dataset that outperforms the combined efforts and insights of dozens of highly intelligent, highly informed analysts, drawing on decades of experience and expertise, is incredibly unlikely.
Most of the datasets I evaluate do not have a straight line to alpha in and of themselves. But that doesn’t mean the data has no value. As long as there’s information content, it’s valuable. Modern investors, whether systematic or discretionary or fundamentals-driven or activist or short-biased, are information hounds. So if you can give them data that sheds a new and unique light on some aspect of their portfolio, there’s value.
To use a common analogy, investing is like creating a mosaic and every data point is a tile. You don’t expect to build a complete picture using a single tile. And so alpha is the wrong criterion by which to judge data. Our own testing and research focuses on finding clear and unique information content, but not necessarily alpha.
Having said all that, sometimes it does happen that we find datasets that hold clearcut alpha. And we do try to preserve the alpha for as long as possible — through high prices, limited distribution and so on. But you can only do that for so long. Sooner or later the secret gets out. People devise proxies or find substitutes. And the alpha disappears.
And that’s okay. In fact, it’s expected. Nobody realistically expects to find an alpha source that lasts forever. It doesn’t work that way, for models or technology or data or expertise or anything else. Professional investors recognize this; they know that the trick is to move fast and take advantage of the opportunity while it lasts.
4. Alternative data is only useful for short-term trading and forecasting.
A lot of the early applications for alternative data were focused on the short term because it’s one of the easiest and least risky ways to make money. If you can predict a company’s quarterly sales, or earnings, or costs, and your prediction is consistently better than the market consensus, you will profit. That’s a perfectly legitimate use case, but it’s not the only use case.
We’re increasingly evaluating datasets that offer signals that play out over a longer horizon. For example, if you look at customer profiles for two competing products and find that one skews younger, richer, and more urban? That product would probably be better positioned for the future than another one whose audience is mainly retirees and lower income.
A retailer may say they’re going to target a particular audience and you can overlay transaction and demographic data to determine whether or not they are successful in doing that. This kind of transition can take months for the retailer to accomplish and the payoff may be over multiple years. If you’re a fundamentals-driven investor, this is precisely the kind of signal you care about. You’re not interested in trading around any given earnings statement; you want to buy and hold for the long term. And if you get the long-term trends right, you can generate P&L that is many multiples that generated by a short-term price signal.
5. Alternative data is only valuable to quants.
Quants have been the earliest adopters of modern alternative data, for a few reasons. One is that many of these datasets are hard to work with, so you need quantitative expertise, data science skills and engineering infrastructure. A second is that quants are more likely to allow data to drive decisions. By contrast, most discretionary investors tend to think of data as merely a complement to their core analysis. Divergent views on the centrality of data result in different levels of investment and commitment to data as a profit center.
But that too is changing. Traditional active management is under pressure: from index funds and passive investors at the low end, and from systematic funds and quants at the high end. The industry as a whole is looking for new sources of advantage, and increasingly, that advantage is to be found in alternative data. The rise of so-called ‘quantamental’ traders is a perfect example of how investors who used to rely on fundamentals are expanding their workflow to include new, more quantitative and data-driven methods of analysis.
It’s a hard transition to navigate, but it’s necessary. If there’s one thing I’ve learned in the last few years, it is that alternative data is here to stay. And it will drive an ever-increasing share of investment returns over the next decade. Investors know this, and they are adapting fast; the ones who don’t, are being left behind. It’ll be interesting to see what the future holds.