Hi all,
Thought I'd share an implementation of Non-negative matrix factorization (NNMF).. that's a mouthful so perhaps an explanation is due.
First and foremost, credit for the original algorithm design in the source code goes to Toby Segaran, author of Collective Intelligence - an excellent book on machine learning. My primary value add really is only the implementation on this platform (which by the way, makes it much easier to test and experiment with Segaran's example).
Motivation
We are looking to identify dates during which it appears that some event drove up trading volume across some group of stocks. It is important to differentiate this from correlation. Whereas attempting to find the correlation between two times series of volume would "average out" periods of high co-movement and periods of low co-movement, this algorithm searches just for periods where the trading volumes moved together. While it is not a trading strategy of its own accord, it would be a good base for research for a strategy. For instance, let's say you found a series a dates on which Apple's stock experienced high volume trading and on the same series of dates, so did Microsoft's. Yet, Microsoft and Apple's stock volumes don't always move together. Often Apple will have a high spike in volume that Microsoft doesn't experience and vice versa. So what's happening on those days where they do move together? Perhaps Apple's trading volume spikes on days their board make announcements but only certain kind of announcements affect Microsoft trading... with some research (ie. find what announcements were made on the dates the algorithm spits out) this algorithm is one way to identify what sort of announcements effect both stocks and what don't.
I think you are best served experimenting with this algorithm after you read Chapter 10 of Segaran's Collective Intelligence - I believe the book is now available for free as a PDF and is an excellent primer to machine learning.