Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Creating a new strategy with pipeline factors [QuantCon presentation]

As part of QuantCon, I gave a presentation introducing the typical workflow of a quant researcher. My posts in this thread provide a slimmed down narrative from that presentation. James Christopher did the vast majority of work creating the supporting notebooks and algorithm.

The purpose of the talk was to show how a quant approaches creating and iterating on an idea. We start with a naive theory based on the presence of three sentiment vendors in the Quantopian data program.

Our naive hypothesis is that negative sentiment with regards to a stock will lead to a drop in price over the course of the days that follow. Conversely, positive sentiment on a particular day will lead to a positive price movement for a stock in the following days.

First, we simply explore the data to validate that it has reasonable quality. We assure ourselves that the data has reasonable quality, at the surface level (notebook not shown). Once that is done, to test out our simple hypothesis with the three sentiment vendor data sets, we create three pipeline factors: one for each data vendor in question.

We use the factor tear sheet notebook created by Andrew Campbell to assess the quality of these factors. To use the notebook, we code up the simple, naive factors within this notebook. Then, by running the analyses in this notebook we can take the first steps in answering the basic question: “Do these factors predict price movement?"

Digging into the analysis below, we find that indeed, there is some potential. Though the Sentdex data set likely wouldn’t work across all stocks, we see that for a particular sector code — “Basic Materials” — we have a clue that might show that the sentiment score from Sentdex has predictive power. The bottom 20% of a particular day’s sentiment correlates to a negative price movement. And similarly, the top 20% of the sentiment score correlates to a positive price movement.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

10 responses

We take a step back at this point and think about whether this finding is consistent with an economic hypothesis. Sentdex uses popular, mainstream news sources (WSJ, CNBC, Forbes, etc.). In this case, there might be signal in the sentiment of these news articles when it comes to companies that engage in industries such as mining and materials manufacturing (like aluminum). These companies aren’t typically interesting to a mainstream viewer — they’re certainly not of interest to the typical consumer like Apple. There aren’t editors saying “Get me another Alcoa story, those things go viral on Facebook!” There probably won’t be a lot of noise in these data sets. When a “basic materials” company is on CNBC or the WSJ, it’s probably for solid, newsworthy reasons. So the sentiment of these stories might help predict a future price movement.

The next step is to flesh out the pipeline methodology with this new insight. In addition to using Sentdex and the Morningstar sector code, we can build out the robustness of our stock selection methodology by ensuring we are working with liquid stocks and aren’t duplicating the same company in our algorithm via multiple share classes. In the following notebook, we prototype out the key pipeline logic. In turn we can visualize it such that we feel assured that our hypothetical logic is executed properly.

Next, we might validate our factors in other ways — checking to see if they correlate to other factors like company size or other well known fundamental factors.

Once we are past those steps, it is time to plug in our pipeline code into an algorithm and start backtesting.

After some trial and error with respect to rebalance frequency, we get the following:

Next we can analyze the results in pyfolio:

If we’re shooting for an allocation from Quantopian, we’ve got a lot going for us. It’s hedged, trades frequently and the beta is close to zero. The Sharpe Ratio is decent but not great. On the downside, it doesn't hold very large baskets of stocks due to the focus on a relatively small sector. It’s a good start but there is definitely more work to be done — for example we haven’t used more recent time frames of data via the premium version of the Sentdex data set.

It’s time to iterate and continue to improve this algo. Can we combine it with other factors? Where can we take it next?

[Edit: removed the statement that we weren't taking slippage and commissions into account]

Thanks for the soup to nuts example!

This is awesome!

Tremendous work, Josh!

But why do you say that the algo has yet to be tested with slippage and commission? I do not see any non-default values of the two being used in the source that you show. Or is it that in Q2 commission and slippage are set to zero by default?

Josh -

In theory doesn't the Sentdex Data feed give us the information about the news announcement Date + 1. So is day 1 in the notebook really day 2?

@Tim, good point! I guess I was mixing up this algo with another algo I was working on. I'll update the text above.

The full talk using this content and more has been posted to Youtube: https://www.youtube.com/watch?v=Yr0s7r-Nups