This is part one of a series of blog posts exploring the data factory, the premier approach to data management.
In the assembly line model that powers the data factory, as with many other things, well begun is half done. The first stage in any data process is acquiring the right data for your organization’s needs.
The process of acquiring data involves a number of smaller steps which include sourcing; contracting; extracting, transforming and loading the data; and business development. These steps are by no means linear. They can and often do happen simultaneously.
This stage is all about ensuring that you’re looking at data that is useful, compliant and can generate advantageous insights. You’ve undoubtedly heard the old data adage of “garbage in, garbage out”—that’s why it’s critical to get the process right at a foundational level.
Sourcing the right data for your needs
When sourcing data, it is vital to balance the wealth of data that seems to be out there with considerations that are unique to your firm. Your data strategy should be informed by your institutional strengths and weaknesses, including things like investment horizons, transaction costs and industry expertise.
We’ll take you through the process of how we chose our Air Travel Intent dataset. There is different information associated with each stage of air travel (e.g. trip planning, ticket purchase, day of travel, revenue reported by the airline company). From the trip planning stage, marked by consumer’s airfare searches, to the revenue reported by the airline company in the form of monthly RPM or quarterly earnings, there is a wealth of information available. There are also different challenges associated with each data, for example: accuracy, granularity, detail and sample size can all vary.
There’s no right or wrong answer when it comes to choosing the dataset associated with the air travel industry. For our team, we were looking to get data that was as “early” as possible. Thus, we choose to zoom in on airfare searches, the data associated with trip planning. The dataset has high volume and lags 1-2 days, meaning that the Air Travel Intent dataset provides advance insight into passenger bookings and airline pricing, inferred from searches and click-throughs on a large global search engine for commercial airfares.
Contracting and due diligence: your compliant data
This stage is all about dotting your i’s and crossing your t’s. While not particularly glamorous, the stage of contracting and ensuring compliance with applicable regulatory requirements is vital and necessary for any fund seeking to employ data in their decision-making.
With regulatory regimes such as the California Consumer Privacy Act (CCPA) and Europe’s General Data Protection Regulation (GDPR) increasingly influencing data collection, sharing and transfer, it is imperative to ensure that any data you use is compliant with all applicable regulations. Informed by their constituents’ rising interest in data privacy, regulators have cast a keen eye on the industry and the scrutiny is likely here to stay. If you’re interested in further reading on the future of data privacy and compliance, check out a recent interview on the topic with the Vice Chair of Investment Management at Lowenstein Sandler LLP.
We’ve established that the due diligence process is vital. But what does it actually look like? Whatever your source, you have to ensure that the source has the rights to the data and that there are no privacy concerns (for example, personally identifiable information, or PII). You should know how data usage is governed in their Terms of Service with users, intermediary clients and that it is in compliance with all jurisdictions.
The above information is by no means an exhaustive checklist. Contracting and due diligence is a highly specialized skill and worth the investment in making sure that you’re doing everything by the book.
Extracting, transforming and loading: making your data useful
At this stage, you’ve found the dataset that meets your needs. You’ve checked off all of the contracting and due diligence steps. The next step is to take the data and make it useful and usable.
Take, for example, our Corporate Aviation Intelligence dataset. The data allows subscribers to get advanced warning of mergers and acquisitions, partnerships and global expansion. When these events happen, it’s extremely valuable information to have.
For example, healthcare company Centene flew its private jets to airfields outside of Tampa six times in a span of six weeks prior to making an announcement to acquire Tampa-based WellCare for US$17B in Q1 of 2019. Wellcare stock rose 23% on the announcement.
Corporate aviation data such as the above is all publicly available information. But wrangling together and making sense of satellites, airport logs, transponders, ownership and leasing agreements and operator paperwork is a non-trivial task. It takes specialist skills and infrastructure to ingest, digitize or otherwise tie together all this raw material, prior to beginning your analysis.
Business development: developing your data funnel
Last but not least, it’s vital to consistently develop your business so that your fund has a healthy data funnel. This stage should be ever-present. Build relationships in the industry with primary data sources, partners, expert networks, consultants, marketplaces so that you can have a constant stream of interesting data that you can work on. It helps to know what’s going on in the industry, where next sources of signals could come from and what information could be valuable to you in the future.
Our Patent Value Estimates dataset provides monthly patent valuations for a broad range of international publicly traded companies with history spanning more than a decade. When Quandl set out to acquire data relating to patents, we discovered data that gave insight into a company’s willingness to innovate, but was only updated every year.
Through building a relationship with the source, we worked on figuring out a way to update it on a more frequent basis in order to extract maximum value for investors relying on the data as an actionable trading signal, giving them a heads up on potential future legal monopolies for inventions and competitive advantage.
Up next: transforming your hard-won data
Once the data is acquired, it’s time to ensure the quality and usability of your data. Stay tuned for the next part of our data factory series.
Further reading in the Data Factory series:
- Part Two: Transforming your data
- Part Three: Applying your data
- Part Four: Deploying your data