As discussed in part one of our Data Monetization series, Building a Data Product, in order to sell data to finance professionals, your product must be clean and predictive. Data hygiene and predictiveness, however, are just one of many prerequisites on the path to productization. Though you are not expected to provide trading insights or make stock predictions, analysts and investors don’t want to have to corral your data. They want to be able to consume it easily without doing too much legwork.
In this post we cover two key components to minimize data wrangling:
- How to properly package your data for a Wall Street audience
- Formats in which to deliver your data to a Wall Street audience
Packaging Your Data
It’s the difference between a finished article in a glossy magazine and half-formed notes (brilliant as they may be) scribbled onto a napkin. They both articulate the same points but one is much more tractable for people who are not familiar with the ideas.
In the context of Wall Street, investors demand well-defined, compact datasets that they can work with using their existing toolkit. These tools are often Excel, Python or R. For this reason, it’s important to package raw data assets into consumable data products that fit these applications.
Some of the factors that go into this process are described below:
Index or Estimate Creation: Raw data assets such as satellite image files, shipping manifests and consumer transaction receipts, while rich in information, are not easy to use for most Wall Street traders and portfolio managers. So these datasets must be reduced to more standardized forms: indexes, estimates or other easily manageable numbers. The key is to achieve this reduction without stripping out the relevant information content.
Marketing Collateral: While hedge funds are among the most sophisticated investors in the market, each alternative dataset presents a brand new analytical, economic and trading-strategy challenge. It is critical to educate these customers on the usage and scope of this data, via onboarding documents, white papers, demo spreadsheets, example scripts and other marketing collateral. Empirically, we have found that rich and sophisticated database-level marketing collateral dramatically increases the value of a given dataset.
Compliance Issues: Wall Street investors are justifiably paranoid about issues of compliance: specifically, violating inside information (MNPI) and privacy (PII) regulations. An important part of the data productization process is to ensure that no such issues arise; the data product should be legally and logistically clean, with clear IP ownership and no question of MNPI/PII violations, auditable and access-controlled. Ensuring compliance while being careful not to over-scrub the data such that it loses signaling power is a hard balance to strike.
Productizing a dataset is a challenging exercise. Information quality, predictive power, commercial value and consumable packaging are all necessary to create a dataset that customers will pay for. And even then, the work is not done; the next step is to build an infrastructure that can deliver this data efficiently and reliably.
Delivering Your Data Product
As Amazon and Uber will attest, a good delivery system can be a product in and of itself. This is no different with data products. Delivering your data to your potential clients, however, isn’t as simple as transferring files via FTP. Quantitative analysts and professional investors use specific tools in their data analysis activities, which means they require a data delivery suite to match. The favored analyst tools are R, Excel and Python, while the preferred method of getting data into these applications is via RESTful API.
APIs should be capable of delivering a variety of different data points that are pertinent to an investor’s trading strategy. This could be anything from sales estimates to GPS coordinates. Moreover, a robust API will need to be compatible with a wide range of use cases — including bulk downloads, data transformations and complex filters — and have no service disruption at any point.
If you want to monetize your data, you will need to commit to delivering the data at every single interval specified (hourly, daily, weekly, etc.) without fail. Investors will take large positions based on your data and will need complete faith that the data will always be where it needs to be. Errors in delivery can potentially cost investors thousands, sometimes millions of dollars in trading opportunities; thus, they need a trustworthy and reliable data source.
APIs also should be properly documented and described. Likewise, database pages should feature consistent formatting that detail the column headers, update frequency, ticker mappings (if needed), history and access. Crucially, this infrastructure and its associated instructions should constantly be updated to account for ever-changing company information, technologies and investment tools.
Wall Street expects your data to be clean, error-free, predictive and unique. You must also have a delivery infrastructure in place to produce reliable feeds 365 days a year in formats investors are accustomed to. For the next post in our Data Monetization series, we will cover privacy issues in greater detail including personally identifiable information and selling your data anonymously.