What is Sentiment Analysis?

A quick dive into the methodology we use to determine positive vs. negative sentiment in text

Apr 25, 2022

[This article and all figures are originally from Quantbase, June 2021]

In short, it’s figuring out how positive or negative a snippet of text is. For example, when I tell another human “I really liked dinner last night!” chances are, I'm conveying positive sentiment about a certain experience. If you’ve heard of something like the 7–38–55 rule, you know it’s not always that easy, though — the rule states that essentially, in verbal communication 7% of what is conveyed is through the actual words spoken, 38% through tone of voice, and 55% through body language (for example, sarcasm is hard to convey through text).

Any kind of sentiment analysis that a bot can do from just that text, then, really has its work cut out for it. Luckily, there’s VADER sentiment analysis.

How does it work? A closer look.

VADER ( Valence Aware Dictionary for Sentiment Reasoning) is a computer model used for figuring out sentiment by looking at the positive/negative sense (polarity) of a text’s emotion, as well as its strength (LOVED vs liked). It works through a dictionary that maps words/phrases/emojis to emotional polarities and intensities, and then sums these up to create an overall sentiment score. VADER is on the simpler end, summing up the overall intensity of phrases, but some models/packages/strategies that take this further do other complex calculations on phrases, taking into account cultural vernacular, etc.

The strategy we’re looking at today looks at the community WallStreetBets on Reddit. It takes in as input a number of tickers to analyze (we go through recent posts prior to running the algorithm to scrape tickers), and then goes through posts and comments, running the VADER process on each one and aggregating the total sentiment per ticker by this basis, weighted by upvotes. The idea here is, if a highly positive-sentiment comment about a ticker is getting a lot of upvotes, that comment must be indicative of the sentiment of the community to a large extent. If a highly positive-sentiment comment isn’t getting upvoted, it may only voice the opinions of a minority.

Backtesting and Results

Backtesting is the process of using our algorithm over a historical period, using past price movement, and determining how well the strategy would have performed. There's a number of ways you can do this yourself — consider this comprehensive list of backtesting software. The algorithm we built is backtested over a period from 3/2/2021 to 6/18/2021 (the reasoning for this is to benchmark it against the BUZZ social sentiment ETF, which came out in early March), and here are the results we’ve achieved over that time period with this algorithm, when accounting for slippage:

Annualized Return: 172.65% (this unrealistic number largely comes from GME and AMC’s meteoric rise in the time period we’ve backtested)
Max Drawdown (the difference between the last all-time high and the lowest subsequent trough): -9.1%
Sharpe Ratio (calculated as expected return over the risk free rate, divided by the standard deviation of its return): 2.74
Profit/Loss Ratio (average profit on winning investments divided by average loss on losing investments): 2.77
Average Win (when an investment is sold at a gain, the average of that gain): 9.11%
Average Loss: -3.29%

Why Reddit?

We’ve seen a massive increase in the interest in fast-paced stocks over just the last six months — it’s what made us realize that something like Quantbase needs to exist! Communities like WallStreetBets have more than 10x’d in size since January, and most communities related to algorithmic, crypto, and leveraged trading have done the same, some as much as 20x. Obviously, there’s a lot of interest in beating the market here by actively trading, and people are putting a lot of effort and time into discussing assets that they believe in. With our sentiment analyzing indices, we’re able to capitalize on that immediately.

Reddit WallStreetBets

We’re honing in on WallStreetBets because that’s where a lot of the active, engaging discussion about individual tickers is going on, and it’s a large community. There are others that we’re looking into, but WSB is one of the most exciting communities to check the sentiment of, and it’s been a community that I’ve been a part of personally since 2015.

Scraping

There are thousands of tickers listed on US exchanges alone. To go about actually automating the sentiment analysis, we can’t be inputting these tickers individually or manually, because that would be hard work and also make the codebase incredibly annoying to look at and edit. The quick and easy solution I used was Selenium to scrape the tickers from a website that listed US stocks filtered by market cap, then put those in a dictionary that the resulting sentiment analysis could then act on. Then I used the Reddit PRAW API to pull text information from posts and comments and run the VADER sentiment analyzing algorithm on it. Outputting the resulting sentiment score per ticker lets us then pick the top 15 stocks based on weekly sentiment score, and send that to our order executor to rebalance our portfolio based on the weighted sentiment score of these top picks.

Invest in the WallStreetBets tracking (and slightly more generalizable social media sentiment analysis-based) strategy on Quantbase.

Quantbase Insights

Discussion about this post