Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
How to tell a factor is effective?

How do I know whether a factor is effective?
If only use IC, sometimes it will reject factors which could well distinguish top from bottom?
Is there any standard or method I can follow?

12 responses

One way I just thought of is to use the total variance of return as a combination of factor variances. The one's (factor variances) that contribute most to the variance of the return are effective ones. Of course this assumes that factors are independent. Alternatively you could use power law distributions to identify that ones that have the maximum impact.

Have you looked at Alphalens tool? This tool helps you analyse the ability of a factor to predict future returns, it is a research tool.
if you are instead asking how an algorithm can dynamically detects good factors from bad factors and use the good ones for assets weighting then you might be interested in this thread.

I recommend these resources:

A tutorial on using alphalens
https://www.quantopian.com/tutorials/getting-started
An academic lecture on how to interpret alphalens
https://www.quantopian.com/lectures/factor-analysis

There's no 'perfect' method, but looking at IC and restricting universe based on your hypothesis is generally a good idea.

I also recently outlined these steps at high level here: https://www.youtube.com/watch?v=ArbIM0vhSYQ

Generally you want to come up with a hypothesis and then check IC. Trying to dig deeper can run the risk of overfitting.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Delaney - What is the formula used for IC in alphalens? Is it based on correlation or (2 * correct_estimation_proportion - 1)? I ask this question because I want to find out if IC reported by alphalens assumes linearity?

It uses the Spearman Rank Correlation see here

How does Alphalens incorporate the various constraints described on https://www.quantopian.com/posts/a-new-contest-is-coming-more-winners-and-a-new-scoring-system? In order for a factor to be effective, it needs to pass through the Optimize API with the official constraints unscathed, and not be killed by the commissions and slippage, as well. For example, if a factor has a lot of exposure to the short_term_reversal risk factor (which I guess is just the built-in Pipeline RSI...we don't actually have access to the risk model code), then it won't be at all effective, in the sense that winning a contest prize or getting an allocation will be relatively difficult.

Some food for thought...regarding linearity, one curiosity is that the relationship between a stock's returns and the market (SPY) is given by beta, which found by linear OLS regression. If one looks at the formula for the slope, it is cov(x,y)/var_x, where x is the SPY returns (the independent variable). Examining the formula for the Pearson correlation coefficient, it is cov(x,y)/(s_x*s_y), which is symmetric in x & y; if x & y are ranked data, then we call it Spearman's rank correlation coefficient, which is also symmetric in x & y, and tests for monotonicity (and thus does not assume a linear relationship/model). Additionally, we have total least squares (TLS) and PCA regression (which I think ends up with the same result for the slope as TLS), which can give a different result for the slope from OLS.

@Grant - thanks. Good to see that you are starting to look at other statistical techniques besides OLMAR :)

The biggest problem with long-short versions of OLMAR and similar is that they are killed by the short_term_reversal risk constraint. I'd be interested in hearing from Delaney how the risk model style constraints can be accounted for early in the quant workflow.

Alphalens indeed uses Spearman Rank, you can also find a lecture on spearman rank here: https://www.quantopian.com/lectures/spearman-rank-correlation

I'm not sure of the math of how calibrated Spearman Rank is for non-linear models, but it's certainly less sensitive to outlier behavior, which is effectively what you can get with non-linear models that cause super-linear growth. Non-linear models that introduce additional complexity like non monotonicity are much more difficult to detect in practice, and you probably want to build that into your alpha model. In fact most science is getting to residuals that are linearly predictive, all the modeling of non-linear stuff should probably happen before you're computing your IC. If you find non-linearity in your residuals, then you gotta go back and reevaluate your model.

We're working on tools which will allow you to look at your factor's risk exposure in the research environment. You're totally correct that you want to be able to check that early rather than waiting to build a whole backtest. This webinar that I recently did, especially the section starting where I link here, may be helpful. https://youtu.be/ArbIM0vhSYQ?t=33m8s

What is the short side in Alphalen? Would it be possible to use SPY as the short side?

Alphalens constructs a long short portfolio based on the alpha signal only. If you want to examine a differently constructed portfolio, I recommend using the functions within alphalens to generate the quantile specific returns, then looking at top quantile - SPY for your hypothetical portfolio.

http://quantopian.github.io/alphalens/alphalens.html#module-alphalens.performance

You're correct that many factors have signal only on the long or short sides, so that's something which is important to investigate.

Hi Delaney -

One of the challenges is that I'm not sure I've ever seen an end-to-end example of an effective historical factor (i.e. it was profitable...someone actually made money on it). It would be really engaging and instructive if you could get your hands on an actual historical factor that was traded by a long-short equity hedge fund, similar to the fund you are trying to build.

I realize that you can't just go down to Greenwich or NYC or wherever and knock on the door and ask for a factor, but I'm wondering if some of the industry connections that Q has established might be willing to inject some reality into the discourse.

The other angle, which perhaps is more accessible to you, is has anyone at Q (employee/consultant/etc.) had any success at all in applying the workflow and actually making money? In every field, there are litmus tests of theory, so I'm wondering if you have a "pass" yet?

If we are talking "science" here, then empirical evidence needs to be provided that the methodology works in real-world practice. This prevents gross miscalculations (e.g. Ptolemaic system, luminiferous aether, etc.).