Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
How to compute autocorrelation?

I have a question about auto-correlation, in fact quite technical.

By definition we have $$R(t) = \frac{\langle x_{t'} x_{t'-t}\rangle_{t'} }{\langle x_{t'} \rangle_{t'} \langle x_{t'-t}\rangle_{t'} }$$

Or explicitly:

$$R(t) = \frac{\sum_{n=0}^{N} (x_n - \bar{x}) (x_{n+t} - \bar{x} ) } {\left( \sum_{n=0}^{N} (x_n - \bar{x})^2 \sum_{n=0}^N (x_{n+t} - \bar{x} )^2 \right)^{1/2} } $$

If the signal is periodic, one can easily compute the auto-correlation using the power spectrum, this implies using fourier transforms but will be less computationally expensive (\( O(NlogN) \) vs \( O(N^2) \) )

Formaly:
$$R(t) = \frac{{F}^{-1}(|{F}(x_t - \bar{x})|^2)}{\sigma^2}$$ where \({F}\) is the fourier transform operator, and \(\sigma^2\) is the variance of \(x_t\).

But obvioulsy in finance we do not have any chance of this periodicity. So my question is: Is there a good and efficient way to build auto-correlation? Is there a library which compute it efficiently? (Pandas one will give the output of one given lag, so it would need to be called N time (and looking at the panda code that would lead to a \( O(N^2) \) method)).

4 responses

scipy.signal.correlate

https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.correlate.html#scipy.signal.correlate

I'm not aware of any efficient approximations to auto-correlation.

But you can't control the the "depth" I mean the maximum lag you want to compute, and the window on which you want to compute the correlator.

Hihi, my journey with correlators in finance look more complicated than in cosmology ;-).

So i played with it at lunch time :-)

So as expected SciPy use the brute force method, so I implemented quickly the power spectra approach. See attached notebook.

The result are exactly what one should expect:
For a small array:
SciPy is a bit faster (10%) and the FFT method is a "bad approximation" (at least for white noise).
For a large array:
THe FFT method is around 10 times faster than the SciPy correlate method, both method agree quite well.

lol I just tested with 50000 point and 200 lags, FFT is 100 times faster :-).

I just edited the first post to include the formula of the power sepctra method.
(while writing it, I realize how stupid I can be, so I edited the notebook (no need to substract the mean, one can simply set the 0-mode to 0..., it save the computation of the mean...)

Then one comment, why I do devise by len(x) the final answer. This is just because applying the forward FFT then the backward will multiply the result by len(x). This is quite often the case in FFT library (you can get more info on this by looking at the doc of FFTW).