Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
How do I get monthy data from daily data.

Hi, I am using the get_pricing method from the research API.

I was returned a Pandas data "Panel" like so ...


Dimensions: 6 (items) x 3857 (major_axis) x 1435 (minor_axis)
Items axis: open_price to price
Major_axis axis: 2003-01-02 00:00:00+00:00 to 2018-04-27 00:00:00+00:00
Minor_axis axis: Equity(2 [ARNC]) to Equity(51955 [CDAY])

The 6 items are open, high, low etc.
The 3857 major axis are the dates from 01/01/2003 to present
The 1435 Minor Axis are the Symbols I was getting prices for.

I am trying to get a monthly sample instead of the daily frequency that the API provides.

So I was poking around and I read that I should use pandas "resample" method.

So I want to resample via a monthly timeframe by date,

so I guess I would use the major axis of my panel,

also the documentation says that

"data is returned as a DataFrame with a DatetimeIndex and a columns given by the passed symbols."

So I am just wondering how I would do a resample for monthly data, given this Panel data structure ?

3 responses

The trick with panels (rather than a plain dataframe) is to specify the axis. Something like this.

prices_1m = prices.resample('1M', axis='major').last()

Note that the parameters default to "closed='right', label='right'" which is probably what one wants, but these can be explicitly specified.

See attached notebook.

Good luck.

Dan's method works well. The only problem is last is not the correct resampling method for the open_price, high, low, or volume. Here is a way to deal with that.

The below method will use the first open price of the month, the maximum high of the month, the minimum low of the month, the last close price of the month, and the sum of the volume over the month.

resample_methods={'open_price': 'first', 'high':'max', 'low': 'min',  
                  'close_price':'last', 'volume': 'sum', 'price':'last'}

prices_df = prices.to_frame()  
grouper = [pd.Grouper(level=0, freq='M')] +[prices_df.index.get_level_values(1)]  
prices_1m = prices_df.groupby(grouper).agg(resample_methods).to_panel()  

Note, a simpler, but slower method is by using a loop:

fields = ['open_price', 'high', 'low', 'close_price', 'volume', 'price']  
prices_monthly = {}

for field in fields:  
    prices_monthly[field] = prices[field].resample('M').agg(resample_methods[field])  
prices_1m = pd.Panel(prices_monthly)  

Thanks all for the replies. I ended up using Michael's solution. It worked perfectly. Thanks.