Research design is a fundamental and often over-looked part of the algorithm creation process. It is the simple point where the grounded quant will ask themselves, "What is my universe?"; and "What is my training/testing dataset?" These two simple questions lay a solid framework from which to frame and validate results from factor research and backtesting.
Here's an overview of what you'll learn from this notebook:
- How to breakdown a list of securities by liquidity baskets
- How to set guidelines for your universe of securities based on capital base and data coverage
- How to validate your universe constraints with your in-sample datasets.
I highly recommend following this series in chronological order. The first part of this notebook covers the first and most important step of strategy creation: data examination. You can find links to each section below as they become available.
And in terms of pacing, the bolded section is where you are now:
- Introduction - Examining the data. My goal here is to simply look at the dataset and understand what it looks like. I’ll be answering simple questions like, “How many stocks are covered?”; “Which sectors have the most coverage?”; and “What’s the distribution of sentiment scores?”. These are very basic but fundamentally important questions that lay the groundwork for all further development.
- Research Design - Here, I’ll be setting up my environment for hypothesis testing define my in and out-of-sample datasets both cross-sectionally and through liquidity thresholds.
- Hypothesis Testing - This is where I’ll be setting up a number of different hypotheses for my data and testing them through event studies and cross-sectional studies. The Factor Tearsheet and Event Study notebooks will be used heavily. The goal is to develop an alpha factor to use for strategy creation.
- Strategy Creation - After I’ve developed a hypothesis and seen that it holds up consistently over different liquidity and sector partitions in my in-sample dataset, I’ll finally begin the process of developing my trading strategy. I’ll be asking questions like “Is my factor strong enough by itself?”; “What is its correlation with other factors?”. Once these questions have been answered, the trading strategy will be constructed and I’ll move onto the next section
- Out-Of-Sample Test - Here, my main goal is to verify the work of steps 1~4 with my out-of-sample dataset. It will involve repeating many of the steps in 2~4 as well the use of the backtester (notice how only step 5 involves the backtester)
As this is my first time working through this flow, the steps above are subject to change as I learn and iterate through my mistakes. Feel free to post feedback and questions.
Quick Notes:
- I’ll be using the Twitter & StockTwits dataset throughout this series. You can import these datasets through:
Pipeline: import quantopian.data.pipeline.aggregated_twitter_withretweets_stocktwits_free
Research: import quantopian.interactive.data.psychsignal importaggregated_twitter_withretweets_stocktwits_free
- The sample version is shown in the attached notebook and is available for both backtesting and research through January 7th, 2016.
- The full version of this dataset is updated daily and includes availability for backtesting, research, and live trading.