This is the first post in a multi-part piece that will introduce you to the PsychSignal dataset. PsychSignal is a data analytics firm that provides real time Trader Mood metrics for US equities. PsychSignal uses their natural language processing (NLP) engine to analyze millions of social media messages in order to provide two quantified sentiment scores for each security.
My motivation for creating this series came from Vinesh Jha, CEO of ExtractAlpha, during our crowdsourced earnings webinar. In that webinar, Vinesh broke down the main steps that he uses to find and extract alpha from data. It was surprising to hear because his process was about 90% research oriented with only the last 10% reserved for out-of-sample testing/backtesting.
For those familiar with my work, I tend to work the other way around: 90% backtesting and 10% research. So with help from Dr. Jess Stauth, VP of Quant Strategy at Quantopian, I’m creating this series in an attempt to learn the way that smart and successful quants operate. Here’s an outline of the series:
- Introduction - Examining the data. My goal here is to simply look at the dataset and understand what it looks like. I’ll be answering simple questions like, “How many stocks are covered?”; “Which sectors have the most coverage?”; and “What’s the distribution of sentiment scores?”. These are very basic but fundamentally important questions that lay the groundwork for all further development.
- Research Design - Here, I’ll be setting up my environment for hypothesis testing define my in and out-of-sample datasets both cross-sectionally and through liquidity thresholds.
- Hypothesis Testing - This is where I’ll be setting up a number of different hypotheses for my data and testing them through event studies and cross-sectional studies. The Factor Tearsheet and Event Study notebooks will be used heavily. The goal is to develop an alpha factor to use for strategy creation.
- Strategy Creation - After I’ve developed a hypothesis and seen that it holds up consistently over different liquidity and sector partitions in my in-sample dataset, I’ll finally begin the process of developing my trading strategy. I’ll be asking questions like “Is my factor strong enough by itself?”; “What is its correlation with other factors?”. Once these questions have been answered, the trading strategy will be constructed and I’ll move onto the next section
- Out-Of-Sample Test - Here, my main goal is to verify the work of steps 1~4 with my out-of-sample dataset. It will involve repeating many of the steps in 2~4 as well the use of the backtester (notice how only step 5 involves the backtester)
As this is my first time working through this flow, the steps above are subject to change as I learn and iterate through my mistakes. Feel free to post feedback and questions.
Strategies with Data
Our allocation process attaches high value to algorithms that use alternative datasets. While many datasets contain a premium section only accessible via subscription, strategies developed on the sample portions of these datasets will also be evaluated.
For questions on accessing this data, please email [email protected]