resampling, other timeframes

Back to Community

posted Jul 12, 2017

Hi,

I think many people on Q are struggling with the issue of timeframes. There is no easy/clear explanation on how to make different size bars, like 1 hour, 2 hours, etc. But, the community is always great to help out, so that is really great.

I'm trying to make 2 hour bars. Assuming NYSE, I want close prices for 11:30, 1:30, 3:30, and 4:00.

I have read that data in Q is in UTC and therefor is not affected by daylight savings time, but I'm not sure I understand that completely, because I have this line of code:

prices = data.history(context.stock, "price", 1550, "1m").resample('120T',  closed='right', label='right', base=90) .last().dropna()

That gives me what I want when DST is active (march to november), but does not give me the proper data when during november to march when there is not DST in effect. This code can be used to determine if DST is in effect:

import pytz  
from datetime import datetime, timedelta

tz = pytz.timezone('America/New_York')  
now = pytz.utc.localize(datetime.utcnow())  
is_dst = now.astimezone(tz).dst() != timedelta(0)

But how should I alter my code above to get the same data (close of 11:30, 1:30, 3:30, and 4:00) when there is no DST? Or, is there an easier way to do all this that I am not seeing?

Thank you.

6 responses

MA K

Jul 12, 2017

I tried setting the base=30 when is_dst==False, I thought that should work, but it doesn't seem to. I'm stuck :/

Dan Whitnable

Jul 12, 2017

@ MA K

"There is no easy/clear explanation on how to make different size bars, like 1 hour, 2 hours, etc. "

Well, maybe take a look at these posts...

https://www.quantopian.com/posts/how-to-use-the-resample-correctly
https://www.quantopian.com/posts/algorithm-takes-time-to-warm-up

And of course, you could always look at the Pandas documentation https://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.resample.html

Those should be pretty pretty easy/clear. Let me know if not and I'll try to make it more clear (and easy).

You should be able to get a rock solid resampled dataframe with code similar to what you have (don't use the base parameter). Now, you did throw a wrench into things with wanting a resample of 120 minutes EXCEPT the last hour of the day which is only 90 minutes . I'll let you figure that little conundrum out.

prices = data.history(context.stock, "price", 1550, "1m").resample('120T',  closed='right', label='right') .last().dropna()

Now, why are you screwing around with timezones? Don't go there. You will be going down Alice's proverbial rabbit hole. Just don't do it! Don't use the 'now' function. Don't compare times. ONLY use the schedule function to run your code at times you request. ONLY think of the fetched data from the history command as relative to the current time. Don't ever look at absolute time and you will be good to go.

Can't make it any clearer than that. Just don't use absolute times and you won't get yourself wound up around the whole daylight saving time issues.

Another whole question I'd like to pose is why? Why do you feel you need to use the resample method to get the prices at these exact times? The resample method is intended to 'smooth over' or fill in the gaps of time series data. If you want to just selectively pick specific times could 't you do it like this.

prices = data.history(context.stock, "price", 1200, "1m")  
my_specific_prices = my_prices.iloc[[-240, -480, -720]]

That would return the price 240 minutes ago, 480 minutes ago, and 720 minutes ago. Adjust these values and put them inside a scheduled function to get the specific prices from the specific times you want. The way cool thing is that this approach is unaffected by daylight savings time.

Good luck.

Dan Whitnable

Jul 12, 2017

If one really wants to start looking at absolute times, the most straightforward is to use the built-in pandas method 'tz_convert'. Something like this.

# Set the US market timezone. Note that this is daylight saving time aware.  
from pytz import timezone  
MARKET_TIMEZONE = timezone('US/Eastern')

# Fetch any data one wants. The results are in a pandas dataframe  
# Dataframes have a built in method '.tz_convert' to easily adjust the datetime index  
data_utc_time = data.history(symbol('SPY'),fields='price', bar_count=2, frequency='1m')  
data_market_time = data_utc_time.tz_convert(MARKET_TIMEZONE)

The datafrme 'data_market_time' will now have all the index values adjusted for the actual market time including magical adjustments for daylight saving time. See the attached algorithm and look at the logs. One can see that the 'data_market_time' always displays 9:31 for the open while the 'data_utc_time' changes from 14:31 to 13:31 when daylight saving time occurs.

The dataframe 'data_market_time' can now be indexed and queried by market time and will always have time values between 9:31-16:00 (all year round and accounting for daylight saving time)

MA K

Jul 12, 2017

Hi Dan,

Thanks for your reply. I have looked at those links which you posted (on Q and resampling documentation) before I posted this, and not for lack of effort, I am having a hard time getting it to work (the way I want). I greatly appreciate your help.

Please see this example below of what I expect, and what I get. Below we have the data in 30 minute intervals (using resampling), this works great. The timestamp + price is output from the algo, and I added the times in my local time (EST), and an asterik (*) beside the prices I want to see for 2 hour bars (as I mentioned above, ending at 11:30, 1:30, 3:30, and 4:00)

2011-08-01 14:00:00+00:00 16.410 10am
2011-08-01 14:30:00+00:00 15.930 10:30am
2011-08-01 15:00:00+00:00 15.940 11am
2011-08-01 15:30:00+00:00 15.950 11:30am *
2011-08-01 16:00:00+00:00 15.730 12pm
2011-08-01 16:30:00+00:00 15.860 12:30pm
2011-08-01 17:00:00+00:00 15.980 1pm
2011-08-01 17:30:00+00:00 15.980 1:30pm *
2011-08-01 18:00:00+00:00 16.140 2pm
2011-08-01 18:30:00+00:00 16.110 2:30pm
2011-08-01 19:00:00+00:00 16.330 3pm
2011-08-01 19:30:00+00:00 16.450 3:30pm *
2011-08-01 20:00:00+00:00 16.670 4pm *

Now, let's take a look at this same interval using your suggested code:

2011-08-01 14:00:00+00:00 16.410
2011-08-01 16:00:00+00:00 15.730
2011-08-01 18:00:00+00:00 16.140
2011-08-01 20:00:00+00:00 16.670

If we look at this output, we can see that I am getting the close of 10am, 12pm, 2pm, and 4pm, which is not what I want! Adding base=90 gives me the data as I want it during DST, but then breaks when we revert to standard timing.

I hope this helps explain my confusion a little more.

Thanks again for all your help, you are what makes this community great.

Blue Seahawk

Jul 13, 2017

prices = data.history(context.stock, "price", 1550, "1m").resample('120T',  closed='right', label='right') .last().dropna()

Be careful if switching over to data.history() with input of a list instead of a single stock when '1m' since dropna would drop all minutes for everybody where even one stock was nan. Instead dropna(axis=1) can sometimes be ok, or .ffill().bfill() to replace nans with neighboring data.
Whether it applies in this situation or not, seemed like an ok time to mention at least to visitors of the future.

MA K

Jul 16, 2017

Hi Blue,

Thanks for your comment, I will keep that in mind when I am using a list of stocks.

Dan, I guess what I'm after is not very easy then? The way I have gotten it to work is to get the 30 minute bars, then manually 'build' the 2 hour bars as I need them. This is not very efficient but it seems to work..

You've successfully submitted a support ticket.

Our support team will be in touch soon.