Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Testing accuracy of .resample('1W') for weekly history dataframe

Hi Quantopians,

I've been looking for the best way to manipulate history data frames into time frames other than 1d and 1m (I need weekly closes, for example). It seems like a simple resampling of the history dataframe is the Q's preferred method:

  1. Accessing different timeframes
  2. Convert 1m pricing data into 3m pricing data
  3. Convert 1m bars into 15m bars
  4. Working with history dataframes
  5. Any plan to provide weekly data via history function?

However after testing .resample('1W') myself, I'm worried this method might be too good/easy to be true. Below is an example of AAPL weekly closes from the beginning of 2015 until now. All 3 handle_data functions are scheduled to run at week_start (timestamps included for reference). wkly_price applies resampling to price, thus wkly_price[stock][-1] printed on Monday morning should be the previous week's close.

However, as you can see, this .resample value is off considerably from 3 different controls I set up:
2015-01-05 09:31:00-05:00 1W resample: 107.97
2015-01-05 09:31:00-05:00 Iterated: 109.3
2015-01-05 09:31:00-05:00 Control A: 109.3
2015-01-05 09:32:00-05:00 Control B: 109.3
2015-01-12 09:31:00-05:00 1W resample: 112.12
2015-01-12 09:31:00-05:00 Iterated: 112.0
2015-01-12 09:31:00-05:00 Control A: 112.0
2015-01-12 09:32:00-05:00 Control B: 112.0
2015-01-20 09:31:00-05:00 1W resample: 107.34
2015-01-20 09:31:00-05:00 Iterated: 105.98
2015-01-20 09:31:00-05:00 Control A: 105.98
2015-01-20 09:32:00-05:00 Control B: 105.98

  1. The first control iterates over the original history dataframe, only adding the price to weekly_closes[stock.symbol] if the following .weekday() value is lower (Monday==0, Tuesday==1, etc., so a value lower than the day before means the day before was the week's close. This is complicated but avoids any Friday holiday 'gotchas' of only recording if .weekday() == 4)
  2. The second control runs once Monday morning at 9:31 and captures price[stock][-2] on a 1d frequency, which will be last week's close since price[stock][-1] is the forward-filled Monday value
  3. The third control runs once Monday morning at 9:32 and captures price[stock][-3] on a 1m frequency, which will be last week's close since price[stock][-1] is the forward-filled price at 9:32, and price[stock][-2] is the forward-filled price at 9:31

Not only are all 3 control values identical, but they come within a penny or two of AAPL's weekly close values according to Google Finance. As you can see, the values found via .resample('1W') are sometimes dangerously wrong.

I'm basically looking for a way to search the history dataframe for weekly closes so my algo doesn't have to record 9 months of data before it can take action. My i,j iterator (Control #1) is a feasible workaround, but if anyone has a better or more Pythonic way to accomplish this it would be much appreciated (or if you see any flaw in my analysis of .resample!)

from datetime import date, datetime  
from collections import defaultdict, deque  
import pandas as pd

EMAWINDOW       = 26  
weekly_closes   = defaultdict(lambda: deque(maxlen=EMAWINDOW*3))  
run_handle_data = False #Makes sure handle_data only runs once at the beginning of the week

def initialize(context):  
    context.stock = symbols('AAPL')  
    schedule_function(handle_data_wkly, date_rules.week_start(), time_rules.market_open(minutes=1))  
    schedule_function(handle_data_daly, date_rules.week_start(), time_rules.market_open(minutes=1))  
def handle_data_wkly(context, data):  
    global run_handle_data  
    price      = history(bar_count=100, frequency='1d', field='price')  
    wkly_price = price.resample('1W')  
    now        = pd.Timestamp(get_datetime()).tz_convert('US/Eastern')  
    for stock in data:  
        #Iterate over dates (index) in price  
        j = price.index[0]  
        for i in price.index:  
            #If current weekday() is < recorded weekday(), it means recorded date was the last  
            #day of the previous week. Add recorded date's close to dict weekly_closes  
            if i.weekday() < j.weekday():  
                weekly_closes[stock.symbol].append(price[stock][j])  
            #Record current date for comparison  
            j = i  
        print "%s 1W resample: %s" % (now, wkly_price[stock][-1])  
        print "%s Iterated:    %s" % (now, weekly_closes[stock.symbol][-1])  
    run_handle_data = True

def handle_data_daly(context, data):  
    price      = history(bar_count=40, frequency='1d', field='price')  
    now        = pd.Timestamp(get_datetime()).tz_convert('US/Eastern')  
    for stock in data:  
        print "%s Control A:   %s" % (now, price[stock][-2])

def handle_data (context, data):  
    global run_handle_data  
    price      = history(bar_count=40, frequency='1m', field='price')  
    now        = pd.Timestamp(get_datetime()).tz_convert('US/Eastern')  
    if run_handle_data == True:  
        run_handle_data = False  
        for stock in data:  
            print "%s Control B:   %s" % (now, price[stock][-3])  
9 responses

Have you tried to resample with the "how" parameter? According to the pandas manual "default to ‘mean’ for downsampling". What you want is not 'mean' but 'last'. Parameters that how can take is: sum, mean, std, sem, max, min, median, first, last, ohlc.

Another thing is that Weekly resampling is the same as weekly frequency from sundays. In this case is there maybe out of hours trading involved?

According to a link in Stack Overflow you also have to add label and closed:

for resampling using how='last' and how='first': don't forget to add closed='left', label='left'.

Thank you, Peter! I was not aware of a mean default at all. I'll have to give that a shot and post an update

Hi Joe and Peter,

Any luck with this?

I've tried "wkly_price = price.resample('1W', how='last', closed='left', label='left')" and the results are still as you described. I've even played around with offsetting "W-FRI" etc (from http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases ) to no avail. I agree it would be nice to find a solution to this that doesn't involve waiting so long into an algorithm's lifetime for data to populate.

  • sniderweb

Hi Sniderweb,

Unfortunately no, I haven't had much luck. However my workaround (lines 20-28 in the source code above) has been working flawlessly, so I'm planning on just sticking with that. It's basically comparing the numeric weekday value (0=Monday, 4=Friday, etc.) and only adding the previous weekday's close if the current weekday value is less than the previous weekday value (IOW, a new week has begun). This approach works regardless of any holidays or early closes.

Admittedly, this would only help you if you're looking for weekly values. Other windows (15-minute, hourly) might have more elegant solutions.

  • Joe

Reviving an old post now that I've had a few months to learn more. Peter was unsurprisingly dead-on. The following code works:

highs = history(bar_count=520, frequency='1d', field='high')  
lows = history(bar_count=520, frequency='1d', field='low')  
closes = history(bar_count=520, frequency='1d', field='price')  
highs_wkly = highs.resample('1W', how=np.max, closed='left', label='left')  
lows_wkly = lows.resample('1W', how=np.min, closed='left', label='left')  
closes_wkly = closes.resample('1W', how='last', closed='left', label='left')  

Note that the last value in the arrays are forward-filled to the date (eg, if run on a Tuesday, the last weekly close value would be Tuesday's close. The second-to-last value would be the previous Friday's close).

@Joe
I am not sure if your solution meaningful or not. I've tried the same way as you but I got weekly datas on Sundays. See my attached notebook.

One can adjust the label offset using the 'loffset' parameter in the resample method. If one wants the labels to be all Fridays instead of Sundays then do something like this

prices_week_s = prices_daily.resample('1W', closed='right', label='right', loffset='-2d').last().dropna()

That will shift the labels from Sunday by 2 days to the previous Friday. This just shifts the labels. Not a problem though with typical securities data which will never have Saturday or Sunday data.

Probably a more 'correct' method is to shift the actual end date of the week and not simply the labels. The default week ends on Sunday but can be changed to end on Friday using the following notation

prices_week_s = prices_daily.resample('1W-FRI', closed='right', label='right').last().dropna()

Many thanks!