Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Working with Pandas DataFrame returned by Pipeline

My aim is to perform some operations on the DataFrame returned by the Pipeline, but I am struggling to write most efficient code.

As I understand, Pipeline returns a MultiIndexed Pandas DataFrame (indexed on datetime and symbol) like this :
-----------------------------Equity (xyz1)
-----------------------------Equity (xyz2)
2015-11-01----------------Equity (xyz3)
-----------------------------Equity (xyz4)
-----------------------------Equity (xyz5)

Lets say I want to operate on a 10 day window on each stock, what is the most efficient way to do that? Using groupby(['symbol']).apply(lambda x : something) is too slow and iterating using .xs(symbol) seems redundant (using loops with pandas is a big no no).

Can someone point me to some resource where I can learn how to do operations on this MultiIndexed DataFrame returned by Pipeline?, particularly applying a function to data of each symbol.
I am fairly advanced with python, so pointing to snippets in zipline project will also work.

2 responses

I use a combination of df.unstack() and df.rolling(). Note, I used the mean function for simplicity in my example, but you can substitute a customized function of your choosing by using df.rolling().apply( some_function).

You can also take a look at the Pandas MultiIndex/Advanced Indexing documentation for a general overview. Note, Quantopian currently uses Pandas version 0.18.1.

Thank you so much Michael.... I completely missed the unstack() function of pandas.
And thanks for the really helpful notebook.