Unbundling Social Media Sentiment Investing
Reddit, Twitter, and Wikipedia - Using Multiple Factors Increases Returns
The first fund we made live on Quantbase in 2021 was the r/WallStreetBets tracker, which made investments based on analyzing the engagement around tickers mentioned on the 11 million-member community, as well as the sentiment of those tickers. The strategy, rebalancing weekly, picked up on obvious inclusions like GME and AMC, as well as many other stocks right before an often 20%+ upswing - RKT, CLOV, CLNE, and WKHS, among a few others. It did really, really well in 2021.
It's also, sadly, a fund that we retired in July 2022. The fund relied on an underlying set of assumptions:
If people are talking a lot more than usual about a stock on an investing forum, it's likely to be trading relatively more volatile than usual
If that investing forum is known to be one where users are known for making large, risky bets, and that investing forum is huge and getting covered by traditional media, the stocks it talks about are much more likely to be trading more volatile
Analyzing this specific forum itself will yield an informational edge because the conversations lead to future investing habits by the conversers themselves as well as people who make bets based on the conversation
It is possible to gauge from all the forum's conversations about the stock what the average sentiment around that stock is.
It is possible to do this using an algorithm.
Assumption 1 made sense in 2021, intuitively - WallStreetBets (WSB) was on everyone's radar. It 20x'd in size after the GameStop movement started, and everything people said on the forum was amplified by people watching it. Assumptions 2 and 3 are what caused inflows into the fund - there was an expectation that WSBers might know what they're doing, and that being in this community all day every day and being in tune with others in the community and their daily moves would make you money. This caused a non-insignificant part of the 20x community growth.
Assumption 4 is the big bet an algorithmic strategy makes, and the big draw its rules-based nature might have over a fundamental WSB-based stockpicker that you might manage your money with. But assumption 4 is harder to get to intuitively. Sentiment is hard to track - we spent hundreds of hours getting something to work, and that was very specifically trained on a specific community with very specific lingo ("can't go tits up" is a phrase that your standard ML library understands out of the box). It takes humans in online communities days to weeks to feel like an actual, participating member of the community, and be able to accurately understand and reflect the sentiment of that community.
Because a sentiment-analysis algorithm is scraping data rather than creating it, it is:
Not participating in the community, and therefore,
Unable to get instantaneous community sentiment feedback (does saying "GME is not a buy" get upvotes?), and because of this,
Will lack the dynamic understanding of community sentiment that a human might have
VADER, the open-source sentiment analysis engine built on Python, was a start - see our original article on sentiment analysis and how we used it to build version 1 of the Wall Street Bets tracking fund. It showed us that, in a year like 2021, it may have been useful to use WSB sentiment analysis to predict short-term future returns. It may have also just told us that buying hype in general in a bullish year worked, which would make Assumption 2 above false, and make Assumptions 3 and 4 moot, even if they were true. This is part of the reason we decided to scrap the Wall Street Bets tracker - it focused on too small a niche of stocks. Utilizing a single-factor approach, IE sentiment from one community, engagement/mentions from one community, etc. is prone to single-source virality bias. Stuff gets popular on WallStreetBets for reasons way different from "this is a good/fun company - I'm going to make a huge bet on it tomorrow" all the time. Memes, bad news, more memes, and cults of personality. We need to take a broader approach that does the equivalent of diversifying exposure to social media across a variety of sources.
This is the reason we decided to cycle out the WallStreetBets fund (as well as our old single-factor Social Media strategy) into a more robust fund tracking a wider cross-section of the internet to generate returns. We built a QuiverQuant-powered Quantbase Social Media Flagship. We look at three sources, each with more wide-ranging datapoints than either of the previous funds looked at.
Twitter
Companies that get a lot of new followers in a time period - aka the ones with a more positive weekly/monthly followership delta - must have something going well for them, right? Our intuition here was that these companies must be getting wider recognition, and the more a company's followership increases relative to its former base, the bigger this magnitude of wider recognition ought to be. But is this recognition good or bad? That is, is it a directionally positive or negative bet to invest in companies with increasing follower counts? In general, our hypothesis was that more follows = good. So use the Twitter factor to rank the stocks with the most positive weekly/monthly followership delta - this is a large magnitude, positive directional bet.
Reddit
This one's a classic. We'll still look at WSB (knowing it's still just a single community, but the largest investing-based online community in the world). When looking at the community, we look at two things to rank stocks on the Reddit factor. Engagement and sentiment. Stocks with lots of mentions likely have lots of social media excitement - this is a magnitude play. Sentiment analysis then classifies all that engagement between positive or negative - this is the direction play. Lots of good talk about a stock ranks it highest, bad talk about a stock ranks it low, and not very much talk at all about a stock, good or bad, means it likely won't get picked up or ranked high.
There's a lot we're interested in on Reddit, and we're not done with just this factor. A future fund may look into social media gems - stocks that aren't being discussed widely on alpha generating channels, but may have some good fundamental research (like GME before it blew up). Future plans also include expanding to these other communities: r/investing, r/stocks, and r/portfolios.
Wikipedia
If you've heard of Wikipedia as a stock market factor, you're probably in the minority. In fact, I think our friends at QuiverQuant may be the first ones to have uncovered this dataset for retail investors. And as it turns out, it works super well as a factor. Companies with an increase in pageviews on Wikipedia over the last month/week end up with an increase in their corresponding stock's value over the next period, at least since our backtest started, January 2019. Intuitively again, this makes sense. Companies with an increase in Wikipedia pageviews (a large positive delta in pageviews) are a magnitude play. The directional play - what makes it a good thing for a company that people are looking for it on Wikipedia - was less clear to us. We think of Wikipedia as a factor that flattens single-factor virality bias that might come from memes, etc shared on the other platforms. A company that is going positive-sentiment viral on Reddit and getting more follows on Twitter, is more likely to be getting genuine interest as a company - vs a funny social media account or a high-quality meme - if people are also looking the company up on Wikipedia.
This post's focus was more on the intuition behind the fund - there were few numbers or statistical analyses. It makes a lot of sense to get exposure to different types of ways people express their interest across social media - conversations with peers (Reddit), conversations with corporations (Twitter), and information-gathering (Wikipedia). Next week, we'll do some more of the math: analysis of the statistics and hedging of this flagship, as well as some of the statistical validities of the factors, and how they measure against factors we didn't use, and against the all-important SPY benchmark. We'll also talk about the decision behind using a hedge at all, and what went into the selection of that specific hedge.
Overall, we're excited to see the new social media flagship play out.
Love the scientific hypothesis testing approach you guys are taking. Cool to see each of your assumptions, how you tested them, and the result.