Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
In Sample VS Out of sample test

Hello Guys!

I'm interested in your in-sample and out-of-sample testing.

For the 2016-2019 (36 month) period, I have made 3,000 strategies using the Random generation method.
I did not use an out of-sample period, just mentioned above
3 years as in-sample.
For this three-year period, all 3000 strategies are more than $ 5
expectancy and 0.90 have greater stability.

Then I started testing all 3000 strategies in "retest" mode,
for periods outside the in-samples: 10,20,30,50,100,200,400% for the period.

10% OOS: 3.6 month: 2015.11.01 - 2016.02.15
20% OOS: 7.2 month: 2015.07.02 - 2016.02.15
30% OOS: 10.8 month: 2015.04.01 - 2016.02.15
50% OOS: ...
100% OOS: ...
200% OOS: ...
400% OOS: 144 month: 2004.02.15 - 2016.02.15

I noticed that as the size of the out-of sample period increased,
less and less of the 3000 strategies
with a expectancy of more than $ 5 and a stability of greater than 0.90.

Do I see things right?
Why is that?
How to set In-sample and Out-of-samplet?
In-sample should be the latest data and the older one is out of sample,
or vice versa?
What ratio should I use for IS and OOS?

Thanks!

1 response

I don't know that there are any hard set rules for in-sample (IS) vs out-of-sample (OOS). Some people use 50% IS and 50% OOS. Others use 80% IS and 20% OOS. Personally I try to save at least 10-20% of the data for OOS, sometimes more.

Should IS data be the latest and OOS data older? Not necessarily I think, but there's nothing wrong with that either. You still have to deal with non-stationarity though, so what 'worked' most recently may not have worked in the far past or in the future. For this reason, I like to use Thomas' NB that can be found on this post. I need to use it more often really.

Lastly, you might want to look at the 'law of large numbers' and survivorship bias. Roughly half of 3000 people 'predicting' a coin flip will guess/predict correctly. 750 will guess correctly two times in a row. 375 will get it right 3 times in a row. And 3 people out of the 3000 might be able to guess correctly 10 times in a row. Does that make those 3 people any better at predicting coin flips? Or where they just randomly lucky?