Community Poll: Expected Semantics of get_datetime() with a timezone argument in daily mode?

People,

At the risk of drifting this thread a bit off topic, I suggest that get_datetime() in daily mode should return the datetime AT MARKET CLOSE.

It should not return midnight.

Midnight has 2 problems. First, the market does not close at midnight. Second, it causes cognitive load. Exactly which day does midnight refer to?

It refers to the day before or the day after?

So, if get_datetime() in daily mode returns the datetime at market close, that behavior solves both those problems.

Also it solves the problem proposed in this thread.

Following the convention used by numpy ecosystem for market-data is a bad idea.

Also if daily-get_datetime() returns the true time, rather than midnight, it is easier to blend it with minutely-based observations.

And, we are lucky that when the market closes at 4pm NY-time on a Monday, it is still Monday in London.

Dan Bikle

Dan raises a valid point. I've attached a backtest to illustrate another reason to re-think the problem. The backtest was run on minutely data with:

prices = history(3,'1d','price')

And for 'print prices' I get:

Security(8554 [SPY])
2014-11-06 21:00:00+00:00 203.18
2014-11-07 21:00:00+00:00 203.37
2014-11-10 14:31:00+00:00 203.40

There are two trailing daily closing prices, and the opening minutely price for the current day. Generally, history with frequency = '1d' will return the trailing daily OHLCV bars time-stamped at the last trade available for SPY for the given day.

When I run the same backtest in on daily bars, I get:

Security(8554 [SPY])
2014-11-06 00:00:00+00:00 203.18
2014-11-07 00:00:00+00:00 203.37
2014-11-10 00:00:00+00:00 203.97

Very confusing.

I think that if Quantopian is going to maintain the ability to backtest on daily bars, then re-jigger your database so that the datetime stamps are the same for daily and minutely backtests. Then the whole get_datetime() issue will disappear. And does it really matter if you break some (all?) daily backtests, since they are not deploy-able anyway?

Or don't modify the database, but at the user interface level, ensure that the datetime stamps are consistent, when comparing daily to minutely backtests.

Alternatively, speed up minutely backtests by ~390X and deprecate daily backtests for eventual obsolescence (the best solution in my mind, since you don't support trading on daily bars anyway).

Grant

Mete Atamel

Maybe it's just me but I don't see any confusion here. get_datetime().date() returns the UTC time an get_datetime('US/Eastern').date() converts the UTC time to US/Eastern. I also find it reasonable that the UTC time in daily mode refers to midnight of that day. My vote is to keep everything the same but maybe add a warning or better docs for get_datetime to explain this behavior.

James Jack

Consider the meaning of a daily backtest when (eventually??) Quantopian has multiple exchanges: US, UK, Japan, Australia, etc. The timezone difference is significant.
Would you want to convert the timezone to work out what day something was actually traded on?
A 'daily' backtest doesn't seem to make a lot of sense in this case.

Dan Bikle

Nov 16, 2014

Q-People,

I've changed my mind on this.

Since we are talking about an API I'd vote to extend the API rather than change it.

Changing an API breaks things belonging to people who committed early to the API.

If I don't like how get_datetime() behaves, I should just avoid it.

Perhaps it would be easy to add calls like:

get_actual_datetime()
get_daily_truncated_datetime()
get_daily_rounded_datetime()
get_actual_nytime()
get_actual_utctime()

Dan

Nov 16, 2014

Assuming that Quantopian doesn't deprecate daily backtests (not in the game plan, I gather), I suppose one approach would be to provide a helper function within a daily backtest that would give the minutely datetime stamp of the daily closing (and opening) for a given security. For example:

get_minutely_datetime(security,event='close')

The event parameter would take values of 'open' or 'close'. Then one could make a 1:1 correspondence between daily and minutely backtests, albeit inelegantly. I gather that is wouldn't bog down the backtest, since handle_data would still only be called once per trade day.

Or maybe something like history(dataset='minutely') that would provide access to the minutely data as a daily backtest is run? Then, the backtest could still be run over daily bars, but the minutely data would be available with their datetime stamps.

Grant

Nov 17, 2014

Hi All,

Thanks for the feedback! There were enough interesting points raised here that I'm not sure what the correct decision on this is...I'll continue to mull this over and post again when I have a concrete proposal. Just wanted to let everyone know that we're still thinking about this.

-Scott

Disclaimer

Blue Seahawk

Nov 29, 2014

Thank you Scott. And great feedback. I'd just like to try to summarize/clarify a couple things from my point of view and offer some code to play around with.
Goal -- To be able to use 'US/Eastern' seamlessly between Daily and Minute modes.

Minute mode always does the right thing.
Daily mode incorrectly returns the previous date with get_datetime('US/Eastern').date() and get_datetime().astimezone(timezone('US/Eastern')), not the current bar date.

The functions history() and fetch_csv() for example only know UTC because pandas dataframes only do UTC.
I like the suggestion of a new function such as get_nyc_datetime() or get_datetime(locale='nyc').
(Or for in the future when you expand to like Tokyo and Shanghai exchanges maybe make that something like get_datetime('exchange_locale'). No matter which exchange, it will return the date and time that a trade thinks it is, so-to-speak, if you asked a trade to look at its watch and tell you what time it is there).

Daily and Minute differences using the code below, 'E' is where 'US/Eastern' was used.

I second having the bars dated at the time market close, for what it's worth. The problem will only get worse if/when you start trading in Tokyo, since if you keep using midnight UTC of the date of the bar, you are liable to start pre-dating the bar, which means algos which backtest spreads between Tokyo and New York in daily mode might have access to the Tokyo data before it has happened.

Gary, a few clarifications:

Daily mode incorrectly returns the previous date with get_datetime('US/Eastern').date() and get_datetime().astimezone(timezone('US/Eastern')), not the current bar date.

get_datetime().astimezone(timezone('US/Eastern')) returns the "correct" time, in the sense that it represents the same instant as a naked get_datetime(). The confusion occurs when you subsequently call the .date() method.

The functions history() and fetch_csv() for example only know UTC because pandas dataframes only do UTC.

To be clear, pandas is perfectly capable of working with non-UTC datetimes, but it's preferable to work in UTC for a number of reasons. The most important of these is that subtracting UTC datetimes always has a single well-defined result, which is not the case for US/Eastern because some times appear twice due to daylight savings time. (For example, 1:30 AM US/Eastern of Sunday March 9, 2014 occurred twice this year. If try to subtract that time from an earlier datetime, there are potentially two "valid" return values, an hour apart.)

I don't think it makes sense for us to add a whole bunch of date-related functions to our API (e.g. the above proposed get_nyc_datetime or get_daily_truncated_datetime()), since it's straightforward enough to write those functions yourself if you want something specific, and there's value in keeping our API small and comprehensible.

Simon, I'm not sure I understand your proposal w/r/t future international exchanges. In a hypothetical future world where we supported both NYC and Tokyo, "the market close" is no longer a single time, but two times: the close in NYC and the close in Tokyo. If anything that seems to me like an argument for using UTC midnight as an arbitrary standin for the date. Or is your point that, since your algorithm executes conceptually at the (NYC) market close of a given day, that we should return that date?

Disclaimer

TIL our blockquote format is gigantic...

Disclaimer

I guess it will depend how handle_data will eventually work for multi-timezone daily bars. The point I was trying to make is that UTC as a standin for Eastern is safer, since UTC midnight is after NYC close. UTC as a standin for Tokyo is is riskier for backtesting since UTC midnight is prior to Tokyo close, so if the backtester itself isn't aware of the actual timezones of the daily bars, it might inadvertently provide the Tokyo close to an algo which is still able to trade New York following, which would be a data snooping bias.

I think I made than more confusing that it is. Dating bars later to UTC is safe, but dating bars earlier to UTC is a risk, in the same way that fetch_csv of data which is pre-dated is also a risk.

Conceptually, the backtester executes your daily handle_data at NYC market close. It will only ever be supplied data from before that instant in time. The question here is more about how we represent that information to the user. I think that no matter what we're going to return a datetime that's in UTC, the question is whether we choose to represent that moment in time with the actual time at which it's logically occurring, or whether we return UTC midnight, which is really a placeholder for the entire date.

Disclaimer

I see, so internally, your bars will be correctly dated to the instant that information was truly available, but you are debating how to wipe out the time information from the datetime?

I see, so internally, your bars will be correctly dated to the instant that information was truly available, but you are debating how to wipe out the time information from the datetime?

Precisely. The question, essentially, is:

"Is it better to represent the current time in daily mode as the closing minute of that day (which could be misleading for values like open_price or volume which actually span the entire day), or as a sentinel value that represents the entire day (which has confusing interactions with timezones and truncation to a date object)?"

Disclaimer

My personal opinion is that that they should have the instant of the close, along with the exchange upon which they trade, and a helper to get the timezone of the exchange. With those two pieces of information, and proper date-time code backed by Olson TZ, one can get the "Date" of the bar if one needs, or anything else.

I don't think it's misleading, it's consistent with minute bars - the open price is the first price of that time span, and the volume is the aggregate volume of all the trades in that bar. It might be helpful if the bar had an "open time" too.

This extends smoothly to multi-timezone daily backtesting, when daily bars for different exchanges can be fed to handle_data in sequence throughout the 24hr cycle, avoiding data-snooping and still providing monotonic time-stamps.

EDIT: to be clear, I don't have any particular vested interest in this proposal or the status quo, this is just my opinion based on having dealt with these issues in the past.

Dec 2, 2014

Hi Scott,

When 'history' returns daily bars from minutely data, what datetimes do you apply to the daily bars? Just do the same for daily data, so that there is no inconsistency in your definition of a daily bar.

Also, do you ever anticipate offering live trading on daily bars? If so, that should be rolled into the discussion here, since there may be additional considerations.

Grant

Dec 10, 2014

Grant,

History returns UTC-midnight bars. That's actually a pretty big strike against the arguments for returning market close in my opinion, since it'd be unfortunate for this to be an error:

prices = history(bar_count=N, frequency='1d', field='price')  
todays_price = prices.loc[get_datetime()]

We could of course change both history and get_datetime, but that's a far-reaching change for a pretty minor benefit; our development time would almost certainly be better spent building more substantive features for the community.

I think for the time being I'm going to leave the implementation as it stands and add a warning if a non-empty tz parameter is passed in daily mode. In the future, we might revisit the possibility of having get_datetime and history be keyed to market close instead of midnight. I think that's probably ultimately the best solution, but also the most time-consuming for our dev team.

Thanks for the feedback to everyone who participated in this discussion!

Disclaimer