Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Factor Tear Sheet

As I have worked more on the 101 Alphas project, it has become clear to me that there is a need for a toolset to evaluate Pipeline factors in the research environment.

When implementing a factor in a trading algorithm, the complexity and wide range of parameters involved with basket selection and trading logic hinder our ability to evaluate the value factor's alpha signal in isolation. Before we proceed to the implementation of an algorithm, we want to know if the factor has any predictive value.

In this analysis, we'll measure a factor's predictive value using the Spearman rank correlation between the factor value and various N day forward price movement windows over a large universe of stocks. This correlation is called the Information Coefficient (IC).

This tear sheet takes a pipeline factor and attempts to answer the following questions, in order:

  • What is the sector-neutral rolling mean IC for our different forward price windows?
  • What are the mean returns for each factor decile?
  • How much are the contents of the top and bottom quintile changing each day?
  • What is the autocorrelation in sector-wise factor rankings?
  • What is IC decay (difference in IC for different forward price windows) for each sector?
  • What is the IC decay for each sector over time?
  • What are the factor quintile returns for each sector?

Please feel free clone and share your feedback below. How do your favorite factors look? Are there any more plots or figures you'd want to see?

For more information on Spearman Rank correlation, check out this notebook from the Quantopian lecture series.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

15 responses

Great work.
Hopefully Q can post a master thread that incorporates the entire workflow of factor brainstorm all the way to this tearsheet

Any conclusions? For the securities analyzed, is the factor good/bad/can't tell? Why?

One thought is that it would be nice to run it on a known-good factor and a known-bad factor, for comparison and testing the tool. Or maybe the one you selected is known-good?

To see how the tool works, would it be feasible to gin up some data for pipeline that would result in the good/bad/ugly cases? In other words, simulate the inputs, to test the tool (since at this point, we may not have any known good/bad/ugly examples). It's kinda like if we were to design a new thermometer. We'd stick it in some ice water to see if we got 0 C, and some boiling water for 100 C (and maybe dry ice and liquid nitrogen, too). And then proceed to measure unknown temperatures.

Within the research platform, is it possible to simulate the input to pipeline? Is it a matter of replacing:

from quantopian.pipeline.data import morningstar  
from quantopian.pipeline.data.builtin import USEquityPricing  

with some blocks of code?

Grant,

Let me clarify, you want to create a fake factor dataset and two fake price datasets.

The factor is highly predictive for one dataset and not predictive at all for the other dataset.

Run 2 factor tear sheet simulations to see the difference in results.

@ Miles,

That's the concept. Rather than "fake" I'd say "simulated" data, but it is a matter of word choice (sometimes, I think the term "toy data set" is used). Andrew is looking at free cash flow yield and EBITA/EV, to see if they have any predictive value as factors (the null hypothesis is that they are useless, and the alternative hypothesis is that they provide useful signals). I'm not sure yet, but I think the answer is "can't tell" (cannot reject the null hypothesis) but I don't yet understand the tool well enough. What is your conclusion and why?

I'm not yet clear if this can be done in the research platform. Rather than bog down this thread with the nitty gritty details, I've posted the question separately: https://www.quantopian.com/posts/possible-to-simulate-inputs-to-pipeline-in-the-research-platform .

@Grant

Most of the plotting functions in the factor tear sheet take a DataFrame with date, equity, sector_code, factor_value, and forward_price_movement columns. You could either build a fake pipeline output DataFrame in this form from scratch or generate a pipeline output using the construct_factor_history and add_forward_price_movement functions and replace the values in the factor and forward_price_movement columns as you see fit. Putting random floats into each column would simulate a bad factor. Identical factor and forward_price_movement columns would yield a perfectly predictive factor.

Thanks Andrew. Modifying the pipeline output might be the way to go. --Grant

[EDIT:] I'd like to be able to modify/simulate the input data to pipeline. Is there any way to do this?

Andrew,

Are you going to update this module now that open prices are available in the pipeline?

I have built a backtesting framework similar to yours and it takes alittle brain work to make sure that your data lines up.

I would be happy to assist.

My framework has noticable speed improvements. I am not sure if updating the module to get returns from the backtester is necessary.

Miles,

This tearsheet is built to accept any Pipeline factor as an input. The analysis uses get_pricing to pull in the data needed to compute forward price movements. Are you suggesting using open prices in the calculation of forward price movement?

Is your framework in an IPython notebook? If you are willing to share it, I'd love to take a look.

Yea I went ahead and now load the returns thru pipeline using open prices. It seemed to be a lot more efficient.

I would happily share it except I just reached my vacation destination. Will have to give me a week.

Andrew, I've been using your great NB for a while and I found it really useful. I added some features that I believe could improve the NB. I would really appreciate if you could review my changes:

  • added plot of average cumulative returns for each factor quantile
  • added plot of fwd price vs factor distribution to possibly spot not linear relashionship between the two
  • added box plot of mean return by factor quantile (this highlights the volatility of those returns compare to the simple bar plot)
  • added option to use market neutral returns (returns in excess of market performance) and beta neutral returns (returns in excess of beta component) instead of simple stock price

This is really cool. How come you guys don't have a blog where you post nice notebook tools and news? A lot of people can miss this on the forum.

@Luca Thanks for the additions! I think the plots you added do a great job depicting the exact nature of a factor's signal decay and the variance in forward returns.

I particularly like the condensed version of the cumulative return plot (the one with all the quantiles in a single axis). I see that there are some error bands around those cumulative returns lines. Should those bands be of the same width as the error bars in the you show in the cumulative returns broken apart by factor quantile?

I find the mean return by factor quantile boxplot to be a little difficult to read. Though, I think it makes an important point - we care about the variance in the forward return of a factor quantile because it tells us about the significance of the observed mean forward return. Adding error bars to the mean return by quantile bar chart would allow us to capture the significance of the observed mean return while maintaining some readability. While it is interesting to know about higher moments in the forward returns distribution for each factor quantile (skewness, kurtosis, tail events, etc.), I'm not sure how to display them in a not so overwhelming way.

The kernel density plots of factor value vs forward return are really neat. The regression plot is interesting. The scatterplot is tough to look at, though. Would it make sense to put a regression line on top of a KDE plot? Good catch with the outlier removal.

@Andrew

The cumulative return plot use seaborn.tsplot that shows the standard error of mean. I added an option to plot standard deviation of the values too, as bars. In that particular case I disabled the standard deviation bars in the "condensed" plot (because it is impossible to read with all the standard deviation bars) and enabled the bars in the single quantile plots. All in all it depends on how you want to call the function that does the plotting, you can turn the option (and others) via function arguments.

Regarding the boxplot I agree that it is difficult to read, but this is because that particular factor is not really powerful to predict the forward price. Looking only at the simple bar plot might mislead us in believing it has a strong predicting power while it hasn't. Looking at both plots (simple bar and boxplot) should avoid ambiguity.

I tried to put a regression line on top of a KDE plot but seaborn doesn't allow that (at least I wasn't able to do that).

What do you think about the market neutral returns and beta neutral returns to be used instead of simple stock price?

Updated with bug fixes:
- Daily Mean Return box plot: force the fwd_price bars to be on the same order in each plot
- beta excess returns (abnormal returns) little improvement: keep in consideration the null hypothesis after linear regression + bug fix + bug fix
- improved selection of high volume stock universe (credit Seong Lee, event study NB)