Hi Quantopians,
I've been looking for the best way to manipulate history data frames into time frames other than 1d and 1m (I need weekly closes, for example). It seems like a simple resampling of the history dataframe is the Q's preferred method:
- Accessing different timeframes
- Convert 1m pricing data into 3m pricing data
- Convert 1m bars into 15m bars
- Working with history dataframes
- Any plan to provide weekly data via history function?
However after testing .resample('1W') myself, I'm worried this method might be too good/easy to be true. Below is an example of AAPL weekly closes from the beginning of 2015 until now. All 3 handle_data functions are scheduled to run at week_start (timestamps included for reference). wkly_price applies resampling to price, thus wkly_price[stock][-1] printed on Monday morning should be the previous week's close.
However, as you can see, this .resample value is off considerably from 3 different controls I set up:
2015-01-05 09:31:00-05:00 1W resample: 107.97
2015-01-05 09:31:00-05:00 Iterated: 109.3
2015-01-05 09:31:00-05:00 Control A: 109.3
2015-01-05 09:32:00-05:00 Control B: 109.3
2015-01-12 09:31:00-05:00 1W resample: 112.12
2015-01-12 09:31:00-05:00 Iterated: 112.0
2015-01-12 09:31:00-05:00 Control A: 112.0
2015-01-12 09:32:00-05:00 Control B: 112.0
2015-01-20 09:31:00-05:00 1W resample: 107.34
2015-01-20 09:31:00-05:00 Iterated: 105.98
2015-01-20 09:31:00-05:00 Control A: 105.98
2015-01-20 09:32:00-05:00 Control B: 105.98
- The first control iterates over the original history dataframe, only adding the price to weekly_closes[stock.symbol] if the following .weekday() value is lower (Monday==0, Tuesday==1, etc., so a value lower than the day before means the day before was the week's close. This is complicated but avoids any Friday holiday 'gotchas' of only recording if .weekday() == 4)
- The second control runs once Monday morning at 9:31 and captures price[stock][-2] on a 1d frequency, which will be last week's close since price[stock][-1] is the forward-filled Monday value
- The third control runs once Monday morning at 9:32 and captures price[stock][-3] on a 1m frequency, which will be last week's close since price[stock][-1] is the forward-filled price at 9:32, and price[stock][-2] is the forward-filled price at 9:31
Not only are all 3 control values identical, but they come within a penny or two of AAPL's weekly close values according to Google Finance. As you can see, the values found via .resample('1W') are sometimes dangerously wrong.
I'm basically looking for a way to search the history dataframe for weekly closes so my algo doesn't have to record 9 months of data before it can take action. My i,j iterator (Control #1) is a feasible workaround, but if anyone has a better or more Pythonic way to accomplish this it would be much appreciated (or if you see any flaw in my analysis of .resample!)
from datetime import date, datetime
from collections import defaultdict, deque
import pandas as pd
EMAWINDOW = 26
weekly_closes = defaultdict(lambda: deque(maxlen=EMAWINDOW*3))
run_handle_data = False #Makes sure handle_data only runs once at the beginning of the week
def initialize(context):
context.stock = symbols('AAPL')
schedule_function(handle_data_wkly, date_rules.week_start(), time_rules.market_open(minutes=1))
schedule_function(handle_data_daly, date_rules.week_start(), time_rules.market_open(minutes=1))
def handle_data_wkly(context, data):
global run_handle_data
price = history(bar_count=100, frequency='1d', field='price')
wkly_price = price.resample('1W')
now = pd.Timestamp(get_datetime()).tz_convert('US/Eastern')
for stock in data:
#Iterate over dates (index) in price
j = price.index[0]
for i in price.index:
#If current weekday() is < recorded weekday(), it means recorded date was the last
#day of the previous week. Add recorded date's close to dict weekly_closes
if i.weekday() < j.weekday():
weekly_closes[stock.symbol].append(price[stock][j])
#Record current date for comparison
j = i
print "%s 1W resample: %s" % (now, wkly_price[stock][-1])
print "%s Iterated: %s" % (now, weekly_closes[stock.symbol][-1])
run_handle_data = True
def handle_data_daly(context, data):
price = history(bar_count=40, frequency='1d', field='price')
now = pd.Timestamp(get_datetime()).tz_convert('US/Eastern')
for stock in data:
print "%s Control A: %s" % (now, price[stock][-2])
def handle_data (context, data):
global run_handle_data
price = history(bar_count=40, frequency='1m', field='price')
now = pd.Timestamp(get_datetime()).tz_convert('US/Eastern')
if run_handle_data == True:
run_handle_data = False
for stock in data:
print "%s Control B: %s" % (now, price[stock][-3])