Perhaps someone can provide guidance on the best way to implement the binning described in "3.1 Dimensionality Reduction Via PAA" in the paper below. For example, starting with minute-level data, I'd like to create 15 minute wide bins, compute the average for each bin, and store the result in a vector. Additionally, it would be nice to store a corresponding datetime stamp centered on each bin, but this is not absolutely necessary. As an initial go, it would be fine to ignore the datetime stamps (as an approximation, ignore that the market is closed evenings/weekends/holidays).
Does Pandas have a convenient way of doing this, so that I could keep the data returned by the batch transform in its native format? Numpy/scipy? Other?
I'm interested in pattern recognition in time series, and as discussed in the paper, the binning is the first step in coding the time series.
Lin, Jessica, et al. "A symbolic representation of time series, with implications for streaming algorithms." Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery. ACM, 2003.
www.cs.ucr.edu/~stelo/papers/DMKD03.pdf