Sentiment data and analysis has long been part of the toolkit of the successful investor. If you know what other people think about a stock, you can profit from their positioning.
In the past sentiment had to be gleaned piecemeal from newspaper reports, anecdotes and personal observation. But in recent years, the internet has unleashed a flood of new sentiment sources, from real-time news websites to social media posts to self-published blogs and reviews. The first wave of internet sentiment analysis companies achieved success capturing this data and selling it to financial investors.
Predictably, the market has been quick to assimilate and incorporate this new source of alpha. And today, “traditional” social sentiment analysis is all but tapped out. There’s limited predictive power remaining in the wisdom of crowds; many signals have long been arbitraged away.
A new approach to sentiment is needed, and METRICLE is one of the companies leading the way. METRICLE is expanding the frontiers of sentiment analysis: with a new social graph that identifies “sentiment influencers” who move markets; with custom lexicons to flag sector-specific keywords; a big data pipeline that can handle rapid events “in memory”; a machine-learning toolkit for iterative refinement; and much more.
Quandl Head of Partnerships Chris Stevens talked with Vishal Goklani, founder and CEO of METRICLE, to learn more about his innovative alternative data company.
CS: Why did you start METRICLE?
VG: I am both a trader and a scientist, and I personally found it very difficult to identify real-time, market-moving information within seconds of a source event. At METRICLE, we do this across several thousand tickers simultaneously, from different mediums (both social and news). And, as we’re real gluttons for punishment, we added sentiment to the mix.
Sentiment analysis seems to be around every corner these days. Why is METRICLE different?
Most social graphs are constructed by using a standard approach of connecting users based on follower/following methodologies to identify influencers of a social network. At METRICLE, we built a unique social network graph, with a different type of connection methodology that benchmarks users for their performance in broadcasting financial alerts.
The company’s social graph identifies thousands of Twitter financial users, scoring them based on graph theoretical models from social network analysis. We algorithmically cluster users into 100 discrete groups. Users in each group share similar characteristics (ie, journalists, activists, pharma experts, etc). By clustering the users, METRICLE is able to algorithmically discard irrelevant user groups. Influencers are then algorithmically selected and benchmarked for their performance in posting breaking information.
So you’re saying that not all opinions are equal.
Precisely. In the financial markets, some voices matter more than others, and you have to identify the people who consistently deliver real-time news or opinion that moves asset prices. If you do that, it turns out you obtain a very strong signal.
Once you’ve identified your influencers, how do you determine their actual opinions?
We initially employed a few different lexicons from the literature, but quickly realized they were inadequate for modelling social conversations. To address this, our team built multiple classifiers using L1 Logistic Regression, and then worked carefully on domain-specific feature construction.
Our classifiers identify AND distinguish market moving terminology within individual sectors. For example, the drivers of the pharma sector (e.g. “FDA approves”) are fundamentally different than those of the automotive sector (e.g. “airbag recall”). Moreover, our highly specialized classifiers also differentiate between various markets, from U.S. equities, commodities and forex — the latter of which is heavily influenced by geopolitical events.
And we didn’t just stop there. Our proprietary classifiers simultaneously discern the credibility of the information source, by leveraging the results from our social graph. For example, users who frequently tweet about pharma are statistically less credible when it comes to technology and macro events. This comprehensive methodology allows us to build high performance classifiers.
What are some of the biggest challenges you face with respect to your algorithms?
Everything from finding sources, to parsing, to our infrastructure IP, to our algorithms. This wouldn’t be interesting if it was easy. We use a streaming pipeline that ingests real-time data feeds, with very low latency. The pipeline has several layers of redundant processing; I like to call it our twin-turbo engine. Working with streaming data is very tricky, as you have to look for patterns amongst different tickers in memory, all in real-time. I love the challenge, because it brings me back to that idea that there are patterns in everything.
How has the use of sentiment analysis/opinion mining changed over the last decade?
Historically, most of the sentiment analysis algorithms were modelled from the language of veteran journalists, who used a very systematic terminology for describing financial events. This standard lexicon is ubiquitous throughout the literature. Social Media is a game-changer. Twitter (short-form content) and blogs (long-form content) brought in more opinionated data, with a very different style of writing and a new set of thought leaders, most of whom used a more colloquial terminology for describing financial events (e.g. “$NFLX whack”, “$MOMO getting smushed”, etc.). This requires different types of models for analysis. At METRICLE, we run different algorithms in production, based on the broadcasting medium.
Are you looking for information sources beyond Twitter, blogs, and news?
Yes, that is something we are working on right now. There are several interesting datasets that we plan to incorporate in our pipeline. Some of these include transactional data (credit card data and email receipts), shipping invoices, cellular location data, and weather data, to name a few.
Who are your primary customers?
We work with quant funds and algorithmic traders. I think it’s interesting to note that by and large, they don’t use our data in a vacuum. They combine our sentiment data with other datasets they have on-hand to produce something truly unique.
What kind of talent is needed to run a company like METRICLE?
You need a combination of quantitative investment experience and data science. At METRICLE we’re all senior quants: people who studied graduate-level math and physics, have strong programming skills, and solid financial experience. We mainly look for scientists, who have experience building simulations and working with data.
I noticed that you, yourself, have a background in physics and astronomy and math. That is quite the trifecta. What got you interested in science?
I had a keen interest in astronomy from a young age. Much to my parents’ chagrin, I think my first love was Saturn. As I started to study the universe more seriously, I began to understand its origin, evolution and ultimately the scientific laws that govern the cosmos. I was also able to identify and interpret patterns, which allowed me to transform seemingly random data into something much more systematic, uniform, and predictable.
I used my background in astrophysics as the foundation to build METRICLE. Similar to investigating the universe, our team at METRICLE analyzes big data in real-time, methodically and systematically, to identify patterns and actionable alerts in finance. Our methodology can actually be applied to a variety of fields — from sports to geopolitics, and more.
We are now focused on Deep Learning, and its applications to NLP. There is a lot of interesting work being done on both Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) applied to classification (i.e. sentiment analysis, and topic categorization). Prior to deep learning, most data scientists used a “bag of words” style approach for constructing classifiers, but now it’s all about word vectors and building sequence-based models like LSTM. These approaches are much more generalizable, and indeed we have some interesting products in the pipeline that will leverage these tools further.
So that’s the science side. What about the financial side? I see that you spent some time at Markit pricing derivatives. Did something catalyze the transition towards a career in quantitative finance?
Markit was an amazing place to work, and I was fortunate enough to be exposed to so many different types of financial products. I found it very interesting that we could use ideas from Stochastic Calculus to price derivatives. But eventually I discovered algorithmic trading, and never looked back.
How do you see the future of sentiment analysis and alternative data in general? Do you think its relevance is overstated?
With the success of social media, information has become more accessible, but harder to interpret or validate. We are also very sensitive to the compliance issues our clients face. Broadcasting news is no longer exclusive, and the non-traditional sources need to be validated. Moreover, alternative datasets are becoming more mainstream; like anything else it will be very difficult for traders to maintain an edge. So there are definitely big challenges ahead, but I wouldn’t say that the relevance is overstated. This world of data ubiquity is going to produce some signals – we just don’t know how many. But it’s worth it to spend the time to figure it out.
What does the future hold for METRICLE?
We continue to enhance our feed and web-based terminal; this allows customers to receive relevant real-time alerts based on our algorithms. We’re also planning to launch an iPhone version soon. We believe this will be a unique product in the marketplace, giving everyone real-time information at their fingertips.
Visit METRICLE on the web.