This is part three of a series of posts exploring the Data Factory, the premier approach to data management. If you missed it, catch up on Part Two: Transforming your data here.
In our previous post, we explored the second stage of data management: transforming your data into usable, clean information. The third stage in our data factory journey is where we begin to unleash the potential of the data that we’ve worked painstakingly to acquire and transform.
How to apply your data
The application stage is where portfolio managers, research heads and quantitative analysts begin to determine how they want to use their data. Data has become table stakes for most funds, and with it being leveraged across a variety of investing styles , it’s natural that applications would differ.
Let us consider how a quant and a non-quant might approach a dataset like Air Travel Intent, which offers advance insight into passenger bookings and airline pricing, inferred from searches and click-throughs on a large global search engine for commercial airfares.
A quantitative analyst might be interested in applying this data to a hypothesis such as this one: clicks are strongly correlated with revenue passenger miles, and revenue passenger miles are predictive of subsequent stock performance. On its own, that is the seed of a potential quantitative strategy or information that could be layered onto a fund’s existing strategies.
A non-quant might unearth something completely different. For example, if they delved into the market share of each airline at each hub based on search and click activity, they might find that in the first half of 2019, American Airlines’ market share at its hub of Dallas/Fort Worth dropped from nearly forty percent to nearly thirty percent, while Delta Airlines’ share at its hub of Atlanta didn’t budge.
What could explain this? Well, American Airlines flies 747 Max, while Delta does not. You’ll remember that the 737 MAX was first grounded in March 2019. How did the companies react to this? American Airlines dropped their prices. In the graph entitled “AAL Price Premium vs. Click Shares,” you can see that there is an inverse relationship between the price they charged and their market share of clicks.
When you dig a little deeper and look at their money-spinner route between JFK and LAX airports (“JFK-LAX”), you can see that there is no relationship between market share and price premium. In this case, their passengers are business travellers who are not as price-sensitive.
As a fundamental investor, you can use the same data to ask very different questions about customer loyalty, brand value, price elasticity, pricing power, market share, route topology and so on. This is, naturally, different from what a quantitative investor would do.
The same data can be applied in a myriad of ways, and your domain expertise will make a difference in how you choose to apply your data.
Research and development
We’ve discussed how some investors may approach the theorizing and hypothesizing part of the application stage. Any resultant theories and hypotheses have to be rigorously tested before employed in a meaningful capacity.
The research and development part of this stage is akin to a classic data science approach. This is where backtesting and forward testing have their moment. With so many datasets out there, it’s easy to data mine yourself into a conclusion that is not actually valid, and it is precisely this outcome that the research and development stage seeks to avoid.
The goal of research and development is to achieve statistically sound, robust out-of-sample results. While it may depend on your investment style, professionals often apply machine learning and learning loops of various kinds at this stage.
Take, for example, the insights that we discovered when working with our Supplier-Implied Risk Attributes dataset. The dataset provides detailed business-to-business behavior for companies and their suppliers, including payment amounts, dates, delays, amounts past due, etc.
As we know, late payments can be predictive of company stress, but they can also be predictive of market power.
It’s true that companies that pay late tend to underperform in the market. But the asterisk to this conclusion is that the correlation flips when you look at the very largest companies—in their instance, paying later is a sign of market power (e.g. a certain large retailer squeezing their suppliers to optimize their conversion pipeline).
This is the type of exciting, non-linear signal that clever data science can reveal. But in order to derive maximum value from your data, it has to be presented so that it is usable and understandable to all who are expected to work with it.
Not everyone who needs to work with data is comfortable working with Python scripts and raw tables. That’s why it’s important to present the data in a useful format such as a dashboard, monitor, mini-app, etc.
While this can take place in a number of formats, the most important factor is that the visualization makes the dataset more accessible to those who need to work with it. After all, we’ve now done the hard work of acquiring, transforming and applying the data, so we should ensure that we get the most out of it.
Up next: Deploying your data
You’ve decided how to apply your data, gone through the rigorous testing process and decided how to best use your data . The next stage in the data factory is, in many ways, the most critical component of the assembly line.
Stay tuned for the next stage in the Data Factory: deploying your data.
Catch up with and look ahead to the other articles in this series: