Scrape Reddit data and get stock tickers | Wallstreetbets | Crypto

Scrape Reddit data

Jun 12, 2021

Introduction

Being a long-term investor, it is tough for me to stay sidelines when wallstreetbets/meme stocks provide a huge return. This article will discuss how to read Reddit data and retrieve stocks discussed from comments and buy them. I am going to discuss two types.

For non-programmers — How to read Reddit data manually and count them
For programmers — Discuss API code and how to retrieve them

Embrace wallstreetbets stocks

Unless you are not living under a rock, you should have heard some hashtags about #wallstreetbets, #meme stocks, #huge returns, #crypto bets, #Gamestop, #AMC, #Clover health, #DOGE, etc. If not, please google about wallstreetbets!

Initially, I hated wallsteetbets/meme stocks due to huge returns, which I was not getting from my portfolio. However, it is not going to change anything. So, instead of hating it, I should embrace them and see if I can make trades on those stocks.

What happens in wallstreetbets?

A group of retain investors gang up to find out the stocks which are heavily hedged and undervalued. They accumulate a ton of that particular stock, and the demand will sour to new highs. That will squeeze the big investors who are shorting the stock. There is a lot more behind the scenes, and I would highly recommend reading over it.

Recently, I have seen different international markets; this ganging up on particular stocks happens in many places. The only change is platform differs in each country. Here we will focus only on the subreddit forum wallstreetbets, which focuses on the US stocks.

How to read wallstreetbets?

If you navigate to this URL, you need to have a Reddit login and join the forum. Once you join, we will be able to see different posts.

If you go into each thread, we can see the posts related to the topic. As you can see, there are 252 comments on this particular day. It is minimal compared to the big day(which Means 100K comments).

Different threads discussion — source Reddit

Below is one particular comment from the forum. For the sake of privacy, I have not added the user's name.

Here is how it works:

The moderators initiate on a topic.
There is a daily thread on discussions and other topic discussions
Moderators have added a self-text (subtitle) stock in the posts.
Although moderators do have a specific title, users decide on the stocks that need to pop.
It would help if you read through the entire comments to find out about the stocks discussed.
Then we can go and buy that particular stock from the brokerage account
Gains will range from 100% to 8000% on that particular stock.

For non-technical people, this is where it ends. You can check manually and trade on it. Or check out the service to get the popular stocks to the mailbox.

For technical folks, let's continue to get the comments and parse through them.

Setup

As we will use python here, I am assuming you have all the setup you needed for coding.

Reddit API key

We needed to create a web app and get the key from Reddit. Check out this link to create that.

Packages

We can scrape the data from Reddit. However, I found an awesome package called praw; it can do all the heavy lifting for us.

Also, to parse the text, you need nltk or spacy for it.

import praw
import spacy
import nltk

Coding

To initiate the connection, we needed to have client id, client_secret, and the user_agent(the web app name)

Read post

The above code shows how to use praw and retrieve the information from Reddit. Here I have given only the limit of 1. It can be extended according to needs.

The output of Reddit — image from author

Now we have the context about the post.

Read comments

Now it is time to read the comments. The below code will help to retrieve the comments from the post.

We provide the limit of 0; it will fetch only the initial list of posts(around 500). On the other hand, if we provide None, it will retrieve all the comments from the post.

Get stock tickers

Now it is time to get the tickers of all the comments. We need to loop through all the comments one by one and get the ticker symbols.

filter(None, [x.strip() for x in re.findall(r"\b[A-Z\s]+\b", comment)])

The above piece of code will retrieve the tickers with caps. Then we will perform the stopwords cleaning using the nltk library.

We will again loop over all the text inside the results and filter some of the regular Reddit, wallstreetbets, type words.

I also have added the close matches between the ticker and comments using the difflib library. We can also use distance-related functions.

Finally, append all the tokens into a list. And perform the counter on the list.

The above results mean CLNE was mentioned 48 times in the comments section. Similarly, other stocks were mentioned correspondingly in one comment.

Depending on the feedback, I will write another article to add the sentiment analysis of the comment and add them correspondingly.

I have to admit that some of the stocks really work—a couple of stocks gave me more than 150% returns. So now, I have created a schedule to directly send it to my inbox using airflow to keep tabs about wallstreetbets.

Underappreciated workflow tool — Airflow
towardsdatascience.com

Final thoughts

Learned how wallstreetbets work and how helpful it is on trading.
We were able to fetch the Reddit posts via API.
Parse comments and get stock tickers out of it.
The number of mentions about the stock ticker.
Schedule it via airflow to get daily updates.

Portfolio Bytes Email

Please subscribe here to get an email daily about the latest Reddit stock tickers.

Get Code

Please subscribe to my newsletter to get the full working code for my articles and other updates.

Code Sprout

Discussion about this post