Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Community Poll: Expected Semantics of get_datetime() with a timezone argument in daily mode?

Hi all,

We recently added the ability to pass a timezone string to get_datetime(). This is mostly as a convenience feature to save the boilerplate of having to call tz_convert('US/Eastern') when you want to be thinking in market hours instead of UTC, which is the default timezone returned by get_datetime() with no parameters.

One issue that was brought up by a user recently is how get_datetime should handle a timezone parameter when run in daily mode. Right now, get_datetime() in daily mode returns midnight UTC of the date being processed by the algorithm. This is in alignment with midnight UTC as the canonical representation of a day in much of the numpy/scipy ecosystem (See, for instance, pandas.tseries.tools.normalize_date.) If you pass a timezone string like "US/Eastern" to get_datetime here, you end up getting the equivalent of doing get_datetime().tz_convert("US/Eastern"), which resolves to 7 or 8 PM the previous day, depending on daylight savings. This is particularly confusing if you're immediately calling .date() on the output, because you get the unintuitive result that

get_datetime().date() != get_datetime('US/Eastern').date()  

As I see it, there are three sane things we could be doing to handle this case better.

  1. Continue with the current semantics, but show an appropriate warning if a timezone is passed in daily mode.
  2. Ignore the timezone parameter in daily mode (possibly also showing a warning that this is happening.)
  3. Throw an exception if a timezone parameter is passed in daily mode.

Of these three, I'm most inclined to go with the second option, but I'd like to get some feedback from the community about what your expectations would be here.

Thanks,
-Scott

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

21 responses

I'm interested in what others say, since time confuses me.
So the only reason get_datetime().date() in daily mode matches the date of the trades is that the clock in Greenwich, England (UTC anchor) at the closing bell in New York City happens to have just ticked over to midnight (00:00:00)? If the test starts on 11-05 and the debugger says 11-05 I would think it was 11-05 all day in NYC. The bar was actually 11-04 prices/activity?

Does pandas only handle UTC.

People,

At the risk of drifting this thread a bit off topic, I suggest that get_datetime() in daily mode should return the datetime AT MARKET CLOSE.

It should not return midnight.

Midnight has 2 problems. First, the market does not close at midnight. Second, it causes cognitive load. Exactly which day does midnight refer to?

It refers to the day before or the day after?

So, if get_datetime() in daily mode returns the datetime at market close, that behavior solves both those problems.

Also it solves the problem proposed in this thread.

Following the convention used by numpy ecosystem for market-data is a bad idea.

Also if daily-get_datetime() returns the true time, rather than midnight, it is easier to blend it with minutely-based observations.

And, we are lucky that when the market closes at 4pm NY-time on a Monday, it is still Monday in London.

Dan Bikle

Dan raises a valid point. I've attached a backtest to illustrate another reason to re-think the problem. The backtest was run on minutely data with:

prices = history(3,'1d','price')  

And for 'print prices' I get:

Security(8554 [SPY])
2014-11-06 21:00:00+00:00 203.18
2014-11-07 21:00:00+00:00 203.37
2014-11-10 14:31:00+00:00 203.40

There are two trailing daily closing prices, and the opening minutely price for the current day. Generally, history with frequency = '1d' will return the trailing daily OHLCV bars time-stamped at the last trade available for SPY for the given day.

When I run the same backtest in on daily bars, I get:

Security(8554 [SPY])
2014-11-06 00:00:00+00:00 203.18
2014-11-07 00:00:00+00:00 203.37
2014-11-10 00:00:00+00:00 203.97

Very confusing.

I think that if Quantopian is going to maintain the ability to backtest on daily bars, then re-jigger your database so that the datetime stamps are the same for daily and minutely backtests. Then the whole get_datetime() issue will disappear. And does it really matter if you break some (all?) daily backtests, since they are not deploy-able anyway?

Or don't modify the database, but at the user interface level, ensure that the datetime stamps are consistent, when comparing daily to minutely backtests.

Alternatively, speed up minutely backtests by ~390X and deprecate daily backtests for eventual obsolescence (the best solution in my mind, since you don't support trading on daily bars anyway).

Grant

Maybe it's just me but I don't see any confusion here. get_datetime().date() returns the UTC time an get_datetime('US/Eastern').date() converts the UTC time to US/Eastern. I also find it reasonable that the UTC time in daily mode refers to midnight of that day. My vote is to keep everything the same but maybe add a warning or better docs for get_datetime to explain this behavior.

Consider the meaning of a daily backtest when (eventually??) Quantopian has multiple exchanges: US, UK, Japan, Australia, etc. The timezone difference is significant.
Would you want to convert the timezone to work out what day something was actually traded on?
A 'daily' backtest doesn't seem to make a lot of sense in this case.

Q-People,

I've changed my mind on this.

Since we are talking about an API I'd vote to extend the API rather than change it.

Changing an API breaks things belonging to people who committed early to the API.

If I don't like how get_datetime() behaves, I should just avoid it.

Perhaps it would be easy to add calls like:

get_actual_datetime()
get_daily_truncated_datetime()
get_daily_rounded_datetime()
get_actual_nytime()
get_actual_utctime()

Dan

Assuming that Quantopian doesn't deprecate daily backtests (not in the game plan, I gather), I suppose one approach would be to provide a helper function within a daily backtest that would give the minutely datetime stamp of the daily closing (and opening) for a given security. For example:

get_minutely_datetime(security,event='close')

The event parameter would take values of 'open' or 'close'. Then one could make a 1:1 correspondence between daily and minutely backtests, albeit inelegantly. I gather that is wouldn't bog down the backtest, since handle_data would still only be called once per trade day.

Or maybe something like history(dataset='minutely') that would provide access to the minutely data as a daily backtest is run? Then, the backtest could still be run over daily bars, but the minutely data would be available with their datetime stamps.

Grant

Hi All,

Thanks for the feedback! There were enough interesting points raised here that I'm not sure what the correct decision on this is...I'll continue to mull this over and post again when I have a concrete proposal. Just wanted to let everyone know that we're still thinking about this.

-Scott

Thank you Scott. And great feedback. I'd just like to try to summarize/clarify a couple things from my point of view and offer some code to play around with.
Goal -- To be able to use 'US/Eastern' seamlessly between Daily and Minute modes.

Minute mode always does the right thing.
Daily mode incorrectly returns the previous date with get_datetime('US/Eastern').date() and get_datetime().astimezone(timezone('US/Eastern')), not the current bar date.

The functions history() and fetch_csv() for example only know UTC because pandas dataframes only do UTC.
I like the suggestion of a new function such as get_nyc_datetime() or get_datetime(locale='nyc').
(Or for in the future when you expand to like Tokyo and Shanghai exchanges maybe make that something like get_datetime('exchange_locale'). No matter which exchange, it will return the date and time that a trade thinks it is, so-to-speak, if you asked a trade to look at its watch and tell you what time it is there).

Daily and Minute differences using the code below, 'E' is where 'US/Eastern' was used.

I second having the bars dated at the time market close, for what it's worth. The problem will only get worse if/when you start trading in Tokyo, since if you keep using midnight UTC of the date of the bar, you are liable to start pre-dating the bar, which means algos which backtest spreads between Tokyo and New York in daily mode might have access to the Tokyo data before it has happened.

Gary, a few clarifications:

Daily mode incorrectly returns the previous date with get_datetime('US/Eastern').date() and get_datetime().astimezone(timezone('US/Eastern')), not the current bar date.

get_datetime().astimezone(timezone('US/Eastern')) returns the "correct" time, in the sense that it represents the same instant as a naked get_datetime(). The confusion occurs when you subsequently call the .date() method.

The functions history() and fetch_csv() for example only know UTC because pandas dataframes only do UTC.

To be clear, pandas is perfectly capable of working with non-UTC datetimes, but it's preferable to work in UTC for a number of reasons. The most important of these is that subtracting UTC datetimes always has a single well-defined result, which is not the case for US/Eastern because some times appear twice due to daylight savings time. (For example, 1:30 AM US/Eastern of Sunday March 9, 2014 occurred twice this year. If try to subtract that time from an earlier datetime, there are potentially two "valid" return values, an hour apart.)

I don't think it makes sense for us to add a whole bunch of date-related functions to our API (e.g. the above proposed get_nyc_datetime or get_daily_truncated_datetime()), since it's straightforward enough to write those functions yourself if you want something specific, and there's value in keeping our API small and comprehensible.

Simon, I'm not sure I understand your proposal w/r/t future international exchanges. In a hypothetical future world where we supported both NYC and Tokyo, "the market close" is no longer a single time, but two times: the close in NYC and the close in Tokyo. If anything that seems to me like an argument for using UTC midnight as an arbitrary standin for the date. Or is your point that, since your algorithm executes conceptually at the (NYC) market close of a given day, that we should return that date?

TIL our blockquote format is gigantic...

I guess it will depend how handle_data will eventually work for multi-timezone daily bars. The point I was trying to make is that UTC as a standin for Eastern is safer, since UTC midnight is after NYC close. UTC as a standin for Tokyo is is riskier for backtesting since UTC midnight is prior to Tokyo close, so if the backtester itself isn't aware of the actual timezones of the daily bars, it might inadvertently provide the Tokyo close to an algo which is still able to trade New York following, which would be a data snooping bias.

I think I made than more confusing that it is. Dating bars later to UTC is safe, but dating bars earlier to UTC is a risk, in the same way that fetch_csv of data which is pre-dated is also a risk.

Conceptually, the backtester executes your daily handle_data at NYC market close. It will only ever be supplied data from before that instant in time. The question here is more about how we represent that information to the user. I think that no matter what we're going to return a datetime that's in UTC, the question is whether we choose to represent that moment in time with the actual time at which it's logically occurring, or whether we return UTC midnight, which is really a placeholder for the entire date.

I see, so internally, your bars will be correctly dated to the instant that information was truly available, but you are debating how to wipe out the time information from the datetime?

I see, so internally, your bars will be correctly dated to the instant that information was truly available, but you are debating how to wipe out the time information from the datetime?

Precisely. The question, essentially, is:

"Is it better to represent the current time in daily mode as the closing minute of that day (which could be misleading for values like open_price or volume which actually span the entire day), or as a sentinel value that represents the entire day (which has confusing interactions with timezones and truncation to a date object)?"

My personal opinion is that that they should have the instant of the close, along with the exchange upon which they trade, and a helper to get the timezone of the exchange. With those two pieces of information, and proper date-time code backed by Olson TZ, one can get the "Date" of the bar if one needs, or anything else.

I don't think it's misleading, it's consistent with minute bars - the open price is the first price of that time span, and the volume is the aggregate volume of all the trades in that bar. It might be helpful if the bar had an "open time" too.

This extends smoothly to multi-timezone daily backtesting, when daily bars for different exchanges can be fed to handle_data in sequence throughout the 24hr cycle, avoiding data-snooping and still providing monotonic time-stamps.

EDIT: to be clear, I don't have any particular vested interest in this proposal or the status quo, this is just my opinion based on having dealt with these issues in the past.

Hi Scott,

When 'history' returns daily bars from minutely data, what datetimes do you apply to the daily bars? Just do the same for daily data, so that there is no inconsistency in your definition of a daily bar.

Also, do you ever anticipate offering live trading on daily bars? If so, that should be rolled into the discussion here, since there may be additional considerations.

Grant

Grant,

History returns UTC-midnight bars. That's actually a pretty big strike against the arguments for returning market close in my opinion, since it'd be unfortunate for this to be an error:

prices = history(bar_count=N, frequency='1d', field='price')  
todays_price = prices.loc[get_datetime()]  

We could of course change both history and get_datetime, but that's a far-reaching change for a pretty minor benefit; our development time would almost certainly be better spent building more substantive features for the community.

I think for the time being I'm going to leave the implementation as it stands and add a warning if a non-empty tz parameter is passed in daily mode. In the future, we might revisit the possibility of having get_datetime and history be keyed to market close instead of midnight. I think that's probably ultimately the best solution, but also the most time-consuming for our dev team.

Thanks for the feedback to everyone who participated in this discussion!

Thanks Scott,

Yeah, I figured it might be too big of a change. My simple-minded view is that since the backtester is event-driven, you'd just need to create a new set of events that are emitted at the daily market close, with minutely timestamps. Effectively, it would be like running the minutely backtester on a set of securities that only trade at the daily close, so that handle_data would be called only once per day (except that the OHLCV bars would be constructed with values from the day).

Another angle on the problem is why does the minutely backtester require a call to handle_data every minute? If one could just call it at the close (or at any arbitrary time, for that matter), then would it run nearly as fast as the daily backtester? Then, my naive thinking is that the daily backtester could be deprecated.

Grant