@Alan:
- I'll answer that below with Joakim's question.
- The quantile statistics are not what you want to look at there. Those just tell how much data is in your quantiles and what range the quantiles cover, not what is actually happening to the stock returns. The way I constructed my factor here (ranking and then zscoring) the quantile stats are pretty meaningless because I know that by construction I will get uniform quantiles over a specific range. The thing to look for instead are the performance metrics like Alpha which is positive for even but negative for odd quarters. Same for mean IC. The mean returns of the quantiles is also nicely going from negative to positive for even but all over the place for odd.
- As outlined in 2 it is bad, you want a factor that works rather evenly across time. Although that is very hard in general, if you find that your factor only works on the time-period you looked at but not your testing set, it's a pretty strong sign you overfit.
@Blue Seahawk: [How would one do that in a backtest?]
I don't think it's currently possible to do this in a backtest, unfortunately. However, I would treat the backtest just as a last step where you already know your factor works from the research env and alphalens. The backtest then is mainly to make sure it's not killed by turnover or has some other undesirable things like high exposures. Ideally you'd spend 90% time in research designing the factor and then when you're done run a single backtest to make sure it also works when actually placing trades. If you don't want to do that, you can just leave out the last 2 years when you run backtests and only test that time-period once at the very end. Personally I'm quite excited about the aspect that the factset fundamentals enforce a 1-year hold-out period we can use to evaluate the strategy over.
@James Villa: [Tried it on his own factor]
Thanks for trying this out! I think you already know this and just wanted to test it out, which is great, but I'll say it anyway: I guess you developed that factor already beforehand which renders this test less meaningful because you already tweaked it on the test-set - it can only be used once. Even then though, the factor doesn't seem to be significant (p-value) in either period. The mean returns of the top and bottom quantile also seems to flip in train and test which also is not a good sign.
@Zenothestoic: [Does being market neutral pay off for corrections like we just experienced?]
This is definitely a market period for which market-neutral was developed. In theory, the market tanks but because you are well hedged your portfolio should not be influenced. Those periods can even be especially lucrative because that's when opportunities open up. So I would hope that our fund would do especially well if the market were to drop 50%. Unfortunately, it seems that for most funds that hasn't been true in this most recent correction: https://twitter.com/robinwigg/status/1055802622739968000?s=21
Thanks for feedback, I agree that the tools have improved a lot, and even more good stuff is coming :).
@Joakim
Would training on even quarters every year be prone to 'over-training' on seasonal trends (e.g. sell-in-May, the January effect, etc), and if so, is there a better way of dividing up the training and testing sets?
This is similar to Alan's question above. I think the key thing to ask there is whether you designed your factor to exploit any of these effects. If it does exploit seasonal patterns than this type of testing might not be the right choice. However, if that's not the case there is no good reason it should behave in this way. Probabilistically speaking, the probability that the factor is overfit is much higher in that case. Finally, would you even want a factor that only works in these certain time-periods? We certainly wouldn't. Having said all that, you could also sample your quarters randomly if it's a concern, or flip every year from even to odd.
I 'tweaked' your simple_momentum factor (quite a bit) to get the below figures on the 'training set' (all else in the NB is the same); I haven't run the 'testing set' yet.
Thank you, that is excellent. Yes, this looks like it could be a great factor just from looking at the stats, which makes it an even better example. I like your proposal of also looking for p < .05 for the test set performance.
@all
One question a few of you touch on is how to actually say one way or the other. Certainly if the factor performance goes from positive to negative between train and test it should be rather obvious, but what if it's not as clear cut? What if it's still positive but maybe not quite as much. This is something I haven't done too much thinking / experimenting with yet, so I hope I will have a better answer at some point. For now, probably the simplest thing which sounds very reasonable is to require the p-value to be < 0.05 in both periods, as suggested by Joakim.