As part of QuantCon, I gave a presentation introducing the typical workflow of a quant researcher. My posts in this thread provide a slimmed down narrative from that presentation. James Christopher did the vast majority of work creating the supporting notebooks and algorithm.
The purpose of the talk was to show how a quant approaches creating and iterating on an idea. We start with a naive theory based on the presence of three sentiment vendors in the Quantopian data program.
Our naive hypothesis is that negative sentiment with regards to a stock will lead to a drop in price over the course of the days that follow. Conversely, positive sentiment on a particular day will lead to a positive price movement for a stock in the following days.
First, we simply explore the data to validate that it has reasonable quality. We assure ourselves that the data has reasonable quality, at the surface level (notebook not shown). Once that is done, to test out our simple hypothesis with the three sentiment vendor data sets, we create three pipeline factors: one for each data vendor in question.
We use the factor tear sheet notebook created by Andrew Campbell to assess the quality of these factors. To use the notebook, we code up the simple, naive factors within this notebook. Then, by running the analyses in this notebook we can take the first steps in answering the basic question: “Do these factors predict price movement?"
Digging into the analysis below, we find that indeed, there is some potential. Though the Sentdex data set likely wouldn’t work across all stocks, we see that for a particular sector code — “Basic Materials” — we have a clue that might show that the sentiment score from Sentdex has predictive power. The bottom 20% of a particular day’s sentiment correlates to a negative price movement. And similarly, the top 20% of the sentiment score correlates to a positive price movement.