Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Statistical Time Frames

Hi all.
I was wondering what time scales should I use for my analyses (regressions, correlations etc.). Should I use all of the data i can find at once? a rolling window? several rolling windows? how long should the window be, short compared to a series typical time scale or long?
If I use a rolling window, how do I treat the distribution of all values? I didn't see any reference in the lecture series and would have liked to hear some insights from experts or assumptions from other newbies.
Thanks!

2 responses

There's no real answer to this, it just depends on what you want to do. If you are trying to detect short-term correlations, then you would use a shorter window - and vice versa.

If there's no specific reason what the timeframe for an analysis should be, a good place to start is to partition the dataset into training and test sets (i.e. k-fold cross-validation). A model is built on the training set, and given some metric to measure model performance, if said metric is significantly different on the test set that could imply violations of some assumptions.

Here are some resources for thinking about time scales.

When using statistical methods, the concept of statistical stationarity is fundamental. This blog post talks about this issue.
The discussion is rather technical; but, the two main points are that the data under consideration should be stationary over the time scales of interest, and using integer differentiation to achieve stationarity (returns) may remove information content .

This paper broadly classifies time scales of markets into two periods. One where trends develop and persist (trading signals are important) and another where trends revert back to 'reality' (fundamental values are important for long-term investments.)

This video breaks down economic cycles into various components.