Data Trends for Investment Professionals

GO TO QUANDL.COM ⟶

# An Interview with Vishal Goklani, Founder & CEO of METRICLE

Sentiment data and analysis has long been part of the toolkit of the successful investor. If you know what other people think about a stock, you can profit from their positioning.

In the past sentiment had to be gleaned piecemeal from newspaper reports, anecdotes and personal observation. But in recent years, the internet has unleashed a flood of new sentiment sources, from real-time news websites to social media posts to self-published blogs and reviews. The first wave of internet sentiment analysis companies achieved success capturing this data and selling it to financial investors.

Predictably, the market has been quick to assimilate and incorporate this new source of alpha. And today, “traditional” social sentiment analysis is all but tapped out. There’s limited predictive power remaining in the wisdom of crowds; many signals have long been arbitraged away.

A new approach to sentiment is needed, and METRICLE is one of the companies leading the way. METRICLE is expanding the frontiers of sentiment analysis: with a new social graph that identifies “sentiment influencers” who move markets; with custom lexicons to flag sector-specific keywords; a big data pipeline that can handle rapid events “in memory”; a machine-learning toolkit for iterative refinement; and much more.

Quandl Head of Partnerships Chris Stevens talked with Vishal Goklani, founder and CEO of METRICLE, to learn more about his innovative alternative data company.

CS: Why did you start METRICLE?

VG: I am both a trader and a scientist, and I personally found it very difficult to identify real-time, market-moving information within seconds of a source event. At METRICLE, we do this across several thousand tickers simultaneously, from different mediums (both social and news). And, as we’re real gluttons for punishment, we added sentiment to the mix.

Sentiment analysis seems to be around every corner these days. Why is METRICLE different?

Most social graphs are constructed by using a standard approach of connecting users based on follower/following methodologies to identify influencers of a social network.  At METRICLE, we built a unique social network graph, with a different type of connection methodology that benchmarks users for their performance in broadcasting financial alerts.

The company’s social graph identifies thousands of Twitter financial users, scoring them based on graph theoretical models from social network analysis. We algorithmically cluster users into 100 discrete groups. Users in each group share similar characteristics (ie, journalists, activists, pharma experts, etc). By clustering the users, METRICLE is able to algorithmically discard irrelevant user groups. Influencers are then algorithmically selected and benchmarked for their performance in posting breaking information.

So you’re saying that not all opinions are equal.

Precisely. In the financial markets, some voices matter more than others, and you have to identify the people who consistently deliver real-time news or opinion that moves asset prices. If you do that, it turns out you obtain a very strong signal.

Once you’ve identified your influencers, how do you determine their actual opinions?

We initially employed a few different lexicons from the literature, but quickly realized they were inadequate for modelling social conversations. To address this, our team built multiple classifiers using L1 Logistic Regression, and then worked carefully on domain-specific feature construction.

Our classifiers identify AND distinguish market moving terminology within individual sectors. For example, the drivers of the pharma sector (e.g. “FDA approves”) are fundamentally different than those of the automotive sector (e.g. “airbag recall”). Moreover, our highly specialized classifiers also differentiate between various markets, from U.S. equities, commodities and forex — the latter of which is heavily influenced by geopolitical events.

And we didn’t just stop there. Our proprietary classifiers simultaneously discern the credibility of the information source, by leveraging the results from our social graph. For example, users who frequently tweet about pharma are statistically less credible when it comes to technology and macro events. This comprehensive methodology allows us to build high performance classifiers.

What are some of the biggest challenges you face with respect to your algorithms?

Everything from finding sources, to parsing, to our infrastructure IP, to our algorithms. This wouldn’t be interesting if it was easy. We use a streaming pipeline that ingests real-time data feeds, with very low latency. The pipeline has several layers of redundant processing; I like to call it our twin-turbo engine. Working with streaming data is very tricky, as you have to look for patterns amongst different tickers in memory, all in real-time. I love the challenge, because it brings me back to that idea that there are patterns in everything.

How has the use of sentiment analysis/opinion mining changed over the last decade?

Historically, most of the sentiment analysis algorithms were modelled from the language of veteran journalists, who used a very systematic terminology for describing financial events. This standard lexicon is ubiquitous throughout the literature. Social Media is a game-changer. Twitter (short-form content) and blogs (long-form content) brought in more opinionated data, with a very different style of writing and a new set of thought leaders, most of whom used a more colloquial terminology for describing financial events (e.g. “$NFLX whack”, “$MOMO getting smushed”, etc.). This requires different types of models for analysis. At METRICLE, we run different algorithms in production, based on the broadcasting medium.

Are you looking for information sources beyond Twitter, blogs, and news?

Yes, that is something we are working on right now. There are several interesting datasets that we plan to incorporate in our pipeline. Some of these include transactional data (credit card data and email receipts), shipping invoices, cellular location data, and weather data, to name a few.

We work with quant funds and algorithmic traders. I think it’s interesting to note that by and large, they don’t use our data in a vacuum. They combine our sentiment data with other datasets they have on-hand to produce something truly unique.

What kind of talent is needed to run a company like METRICLE?

You need a combination of quantitative investment experience and data science. At METRICLE we’re all senior quants: people who studied graduate-level math and physics, have strong programming skills, and solid financial experience. We mainly look for scientists, who have experience building simulations and working with data.

I noticed that you, yourself, have a background in physics and astronomy and math. That is quite the trifecta. What got you interested in science?

I had a keen interest in astronomy from a young age. Much to my parents’ chagrin, I think my first love was Saturn. As I started to study the universe more seriously, I began to understand its origin, evolution and ultimately the scientific laws that govern the cosmos. I was also able to identify and interpret patterns, which allowed me to transform seemingly random data into something much more systematic, uniform, and predictable.

I used my background in astrophysics as the foundation to build METRICLE.  Similar to investigating the universe, our team at METRICLE analyzes big data in real-time, methodically and systematically, to identify patterns and actionable alerts in finance. Our methodology can actually be applied to a variety of fields — from sports to geopolitics, and more.

We are now focused on Deep Learning, and its applications to NLP. There is a lot of interesting work being done on both Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) applied to classification (i.e. sentiment analysis, and topic categorization). Prior to deep learning, most data scientists used a “bag of words” style approach for constructing classifiers, but now it’s all about word vectors and building sequence-based models like LSTM. These approaches are much more generalizable, and indeed we have some interesting products in the pipeline that will leverage these tools further.

So that’s the science side. What about the financial side? I see that you spent some time at Markit pricing derivatives. Did something catalyze the transition towards a career in quantitative finance?

Markit was an amazing place to work, and I was fortunate enough to be exposed to so many different types of financial products. I found it very interesting that we could use ideas from Stochastic Calculus to price derivatives. But eventually I discovered algorithmic trading, and never looked back.

How do you see the future of sentiment analysis and alternative data in general? Do you think its relevance is overstated?

With the success of social media, information has become more accessible, but harder to interpret or validate. We are also very sensitive to the compliance issues our clients face. Broadcasting news is no longer exclusive, and the non-traditional sources need to be validated. Moreover, alternative datasets are becoming more mainstream; like anything else it will be very difficult for traders to maintain an edge. So there are definitely big challenges ahead, but I wouldn’t say that the relevance is overstated. This world of data ubiquity is going to produce some signals – we just don’t know how many. But it’s worth it to spend the time to figure it out.

What does the future hold for METRICLE?

We continue to enhance our feed and web-based terminal; this allows customers to receive relevant real-time alerts based on our algorithms. We’re also planning to launch an iPhone version soon. We believe this will be a unique product in the marketplace, giving everyone real-time information at their fingertips.

Visit METRICLE on the web.

• Anthony Smith says:

Hello Raquel:

I agree with Dmitry, there is limited predictive power remaining in the wisdom of the crowds.

I used to work at a big news provider company and I have seen most of the methodologies used in twitter sentiment.
This company does not have any white papers or analytics supporting any of the 5 points mentioned above (no info in their website, like other companies). This is strange to me since all the companies that show datasets in Quandl are quite serious

I did not find any datasets in Quandl about those strategies. I wonder if you would be so kind
to include some methodologies from this company in the Quandl datasets so we can understand the five points mentioned above.

Kind regards,

Anthony

• Raquel Sapnu says:

Hi Anthony,

• Paul Drogba says:

There is a difference between what could be done with the data and what actually has been done. I don’t see a single statistical driven case study conducted by Metricle and some one above pointed out have they even traded of their data? and where are the results?
There are many startups like Metricle that are competing in this space and I see some like Accern that have generated some statistical case studies on Quantopian. I feel Metricle should come up with something similar.

Anyway keep up the good work.

• Raquel Sapnu says:

Hi Paul,

• The following statement catches eyes and is not developed in subsequent discussions: “There’s limited predictive power remaining in the wisdom of crowds; many signals have long been arbitraged away.”

It obviously contradicts to Jim Surowiecki 2004 book “The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations” that proves just the opposite.

My personal observation is that paradoxically algorithmic trading creates and intensifies arbitrage opportunities on the market rather than “arbitrages them away”.

• Raquel Sapnu says:

Thanks for your comment Dmitry! In this particular sentence, we were merely referring to the “first generation” of social media sentiment data tools. Most of the alpha detected by those tools has now been captured, creating the need for a “second generation” set of tools such as Metricle which use slightly more advanced data-mining and sampling techniques.

It’s not that there is no wisdom in crowds; it’s that some of the most obvious applications of that wisdom have already been tapped, so one needs to be more sophisticated in parsing this wisdom.

• Swamy says:

Please could you share answers for the questions asked on this blog so that way no need to spam on their emails

• Raquel Sapnu says:

Hi Swamy,

Thank you for requesting this. We’ve spoken to Vishal Goklani at Metricle and below are some answers he’s happy to share.

From Vishal Goklani at Metricle:

Thank you all for your comments. We work with a variety of different clients, all of whom utilize our data in fundamentally different ways. I’ve outlined some of these use-cases below, and would be happy to provide any additional information via email: info@metricle.com

Sentiment data is special because it measures market “emotion” independently of price and trading volume. It serves as an important gauge for measuring the strength of a price move. Sometimes it’s leading, in anticipation of an event such as earnings or an upcoming conference call, sometimes it’s lagging, as a reaction to an event, and sometimes it’s just purely speculative (i.e. anti-correlated).

These are a few common strategies for sentiment data, all of which can be used at different time-horizons (intraday vs daily vs weekly, etc):

1. Directional Indicators – a binary prediction that forecasts whether a ticker’s price is likely to increase or decrease, across different time horizons. These indicators are built by combining historical sentiment data with time-series data (i.e. price/volume), and fundamental data.

2. Earnings Predictions – one of the more common use cases of sentiment data is to forecast earnings, by analyzing the aggregate sentiment data 1-3 months prior to earnings.

4. Stat-Arb techniques – building a sentiment-portfolio where you take a long position on the top M% of stocks from the prior X days with the highest positive sentiment, and a short position on the bottom N% from the prior Y days with the highest negative sentiment, and re-balance daily. Optimize your values for X, Y, M, N

5. Early-Early event detection – finding early warning tweets, sometimes days in advance of a price move.

METRICLE’s approach has been to make sentiment analysis a systematic process, where one could use a statistical approach for building forecasting models.

• James Chen says:

The social media space is very crowded with a lot of firms fighting to get an edge.
To our understanding most of the breaking news events hit the financial main stream newswires well before Twitter.

We have explored data from several firms and we would like to take a look at Metricle’s
unique approach to produce practical results.

I   do not see  any dataset (or white paper) with strategies from them. I wonder if Quandl could include some working strategies from this startup.

Best  regards,

James

• Raquel Sapnu says:

Hi James, thanks very much for your comment. We’ve forwarded it to Vishal at Metricle and he’d be happy to speak with you. Could you please email him at info@metricle.com?

• Bastin Ozil says:

Sounds interesting but there are many startups like this one. Have you utilized the sentiment to trade? If yes, can you share with us your performance statements?

• Raquel Sapnu says:

Hi Bastin,

Thanks for your comment. We’ve forwarded your question to Vishal at Metricle. He’ll be able to share results and strategies with you. Could you please email him directly at info@metricle.com with some more details on what you were looking for?

Fix This