Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Converting pandas dataframe to numpy array

Hello all,

I've read that numpy arrays will perform much faster than pandas dataframes or series, and being relatively new, I was wondering if anyone out there in the Quantopian universe converts the pipeline timeseries data to numpy arrays before they run them through calculations/functions. Any insight or assistance would be appreciated.

Thanks!

3 responses

were you ever able to figure this out?

An attempt to be helpful:
This might be relevant: https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array-preserving-index

From that page, I applied the .values tip in making the backtest at https://www.quantopian.com/posts/slope-calculation#5a3daa96c2682c2362bd0bfd.
The problem it solved was that the slope function using statsmodels.api regression is happy with an ndarray (or list) but not a pandas dataframe, so the solution made it possible to obtain slopes for all stocks all at once by changing history output to an ndarray first. The security objects were lost, not present in the ndarray, however the return from slope() was then pieced back together in order with the stocks index and their slope values.
Perhaps the return from data.history(context.output.index, ...) using multiple stocks input like that is the type of timeseries dataframe you're referring to.

In that backtest, to reunite the slope values with their security, around line 48, even though it works, I'm not happy with the process as you can see in my comments. It starts with a dataframe, stocks as index and all nan values, then plugs in the values returning from slope(), then switches to a series for simplicity.

At that stackoverflow page there's also the numpy structured array. Would love knowing what you wind up with.

I suggest looking into Pandas apply which applies a function to the underlying ndarray.