So much data, so little alpha
A number of researchers and consulting shops have published projections of just how big the alternative data industry is destined to be. Deloitte, for example, concluded that the industry will swell to $7 billion next year. They go on to note, “There is no shortage of vendors that provide these data sets to Wall Street.”
I would suggest the “no shortage” situation is related to the promise of that $7 billion spend. In fact “no shortage” understates what’s going on. The amount of data on offer to professional investors is excessive and quickly becoming intractable for practitioners.
That we got to this state was probably inevitable. We, among others, have been guilty of telling data owners how valuable their data might be. We have seen inspirational stories of alternative data used to great success. Add to that the false perception that hedge funds pay vast sums to buy copious amounts of alternative data. (They have so much of it to spend plus their data science robots are insatiable and their AI machines are magic!)
Those in the industry at large are now wise to the hype. So you will find the label “alternative data” attached to an ever-widening selection of products.
The “capital intro” model has been imported to the data industry too. This stokes supply because aspirational publishers need only pay a few thousand dollars to meet and greet hedge fund data buyers at some chic beach hotel. Then there are all the traditional data vendors who are, quite sensibly, looking to monetize their reach by offering open platforms so that anyone can advertise their data wares to the entire industry.
So here we are: producers, brokers, tutors, mentors, platforms, consultants, conference organizers all contributing to a meteoric rise in supply. More data is, of course, a good thing. But the signal to noise ratio is deteriorating. Finding a powerful alternative dataset has always been a needle in a haystack problem. But now the volume of hay is growing faster than the needle count.
For data-driven investors, the opportunity set is growing but so too is the work required to process it all. There are no easy solutions to this. In fact, the perceived ease by which one becomes a profitable data vendor is actually pernicious for almost everyone. At least 98% of these datasets are not going to sell. (This is a data-driven assertion by the way: we are the most discriminating platform in the industry, and our success rate is at best 25%, even after culling 95% of what we process).
That only about 2% of data has real utility is an example of the Anna Karenina principle (“Happy families are all alike; every unhappy family is unhappy in its own way”). Datasets need to satisfy myriad criteria in order to be valuable; missing the mark on even one of these renders the data useless; hence the high failure rate.
What this means is that many firms will lose time and money on what they will later realize was a mistake that could have been prevented by a dose of sober market research. And as the volume of data labeled “alternative” grows, consumers are stuck sifting through a larger and larger haystack in search of the needles they seek.
Data is not like antiques on eBay. One man’s (data) trash is not another’s (data) treasure. Beauty is not in the eye of the beholder; data is objectively good or bad. Only its applicability is subjective in the sense that the requirements of the customer depend on their circumstances.
By example: our Earnings Quality data is objectively excellent. It’s clean, bias-free and predictive of future prices with statistical certainty. It’s a dataset from the 2%. A customer will buy it if it makes them a better investor. And that depends on what and how they trade and the actual marginal impact of the information we can offer them.
Contrast this with any of the hundreds, perhaps over a thousand by now, of datasets we have rejected. These datasets were rejected because they were objectively flawed in one of a hundred possible ways, which renders them useless. No marketplace or long tail will save them. They will not simply not sell; ever.
To imply, as Deloitte does, that we are being served a feast of data is an incomplete characterization. One must add that most of the data on offer is completely inedible. But us food critics have to sample all of it anyway. The more important takeaway here is to not ever mistake quantity of data as a measure of anything at all except perhaps the level of hype around the industry.
So if you are an aspirational data publisher looking for the right partner to maximize sales to institutional investors, what exactly should you do given all the options? I would suggest there is an analogy to be made with the asset management industry actually. In one case you have to decide who to entrust your money to, in the other it’s your data.
I draw this analogy because you can screen the brokers and the conferences and the platforms and us exactly the same way you screen asset managers: check their track records.
The litmus test is not the quantity of data these people have seen or hosted or presented, it’s the outcomes they have achieved. So the question to ask is simple: how many unique data products has this partner sold to more than three buy-side customers?
You might think this a low bar, but it’s not. The organizations buying alternative data today are guided by extremely capable data scientists. They don’t make mistakes. They don’t buy sub-par data. The only way you get a dataset sold to three or more such firms is if that data is good and you know what you’re doing.
The other question worth asking is this: To how many unique industry customers has the partner sold alternative data? You might want to put a minimum price qualifier in there too. This sort of diligence will cut right through the vanity metric that “quantity” is today.
Given your data is subject to the Anna Karenina principle – which is not your fault and sometimes fixable – it is essential to find out if you are in the 2% or have a path to getting there. You can’t learn much about yourself by joining a club that accepts all applicants.
It’s no coincidence that the most discriminating platforms are also the most successful. They create win-win-win situations. Data publishers get a partner willing to put time and effort into their data. They either get maximum revenue for their data or get the learning that their data is flawed in some way.
While the former outcome is the best of course, the latter is not the worst; it is much better than continuing forward thinking you have a salable product when you don’t.
So in this period of inedible feasts one thing is clear to me: more than ever, the deep vetting we do at Quandl is becoming an evermore critical component of our value proposition. We were one of the alternative data pioneers, but we are fast becoming the company with the smallest alternative data catalog. I consider that an accomplishment that serves our customers — and our partners — well.