Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Proposal for specifying window length

Hi all,

We recently did some thinking about how the transforms should work. I think I have a clearer view of what the issues are but I wanted to run this by the community to make sure it is the right thing to do.

The central idea is that we never care about absolute time when specifying a window length, but we really care about trading time. For example, when I say give me a 2 day moving average and it's Monday I want Fri and Thurs.

That's the way transforms have been working. However, we also want to support finer granularity. So the proposed interface would look something like this:

data[sid(24].mavg(days=3)  
data[sid(24].mavg(minutes=10)  

The question is how this should behave in certain corner cases (market open 9:30am, close: 4pm). Some examples:
1. Current: 9:40 am Fri, 20 minute window.
reaches back until: 3:50pm, Thurs
2. Current: 2pm Fri, 1 day window
reaches back until: 2pm Thurs
3. Current 2pm Fri, 1 day window. However, Thurs is half-day, market closed at 12pm.
reaches back until: Thurs 12pm

The algorithm to implement this (in pseudo code) would be as follows:

t1 = cur_time - convert_to_calendar_days(cur_time, days) 

t2 = t1 - minutes

reach_back_until = move_to_earliest_trading_time(t2)  

where days and minutes is the user parameter,
cur_time is the current time,
convert_to_calendar_days() calculates how many actual days have passed to give 'days' trading days.
move_to_earliest_trading_time() converts e.g. 9:20am -> 3:50pm on the previous day

Now, maybe you don't always want to specify a strict time window. Especially for non-liqud stock that only trades e.g. every 10 minutes, a 10 minute mavg would be kinda senseless. Instead, we could add an option that would say, "give me 10 minutes worth of data". So if you run a minute simulation, you'd always have 10 events in the window. So we'd be counting bars instead of time. Maybe mavg(minutes=10, count_bars=True)?

Any form of feedback is greatly appreciated.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

9 responses

I like the sounds of this - particularly the last feature you mention. As you said, very useful for non-liquid, small cap stocks.

Perhaps I'm digressing, but I would find it very useful to keep track/return (perhaps in a data panel?) of all the dates, minutes, or bars currently in the window? Or does this already exist? In my matrix factorization algorithm I worked around this by defining a list, dates[], and then arbitrarily picking a stock and adding appending the "current" date to to it through each loop. I imagine there could be some problems though if I were doing this minute by minute and there were gaps in trading of the particular stock I chose? Or does this exist/gaps would still return the current time?

    #keep track of every date we run through...  
    context.dates.append(data[24].datetime)  

Hello Thomas,

I'll have to read through your ideas again, but one thing to keep in mind here is that we are dealing with time series data, so the datetime stamp matters. For example, if I need to make a decision before market open on Weds. based on data from Mon. & Tues., the information incorporated into the market is at most two days old. However, if I need to make a decision before market open on Mon., the market data from Thurs. & Fri. is up to four days old, with no information incorporated from the weekend. In this case, it would be more risky to make a trading decision on Mon. than on Weds.--absolute time does matter. It seems that a proper moving average based on a trailing window would weight recent tic data more heavily than older tic data, right?

Grant

Thomas,
my input on this is that it makes perfect sense in the daily case. that is, it makes sense to use thurs & fri data coming into monday. to address Grant's concern that data is stale, i'd answer that both theoretically and empirically, that is not so. weekends are globally acknowledged from the perspective of markets (even though quantopian is currently using us equity data, presumably it will branch out and it should be consistent). news on the weekends are generally limited -- though there are significant exceptions, usually sovereign type news such as g10 type meetings or big gov't news. even if there were news, one could liken it to data releases outside of market hours, such as non-farm payrolls at 8:30 est or earnings after the close. the real question becomes, then, are the market moves from fri to mon more alike or different than during the week (e.g., tues to wed). the data suggests weekends are more like single nights. the data is based on empirical volatility studies and most dealers in options use business days (i.e., treating the weekend as a single overnight period) rather than calendar days.
this goes into a larger discussion as to whether volatility is driven by trading, which i believe is the general consensus of the literature.

on the other hand, i'd argue that mixing together trading data from 3:50-4 with 9:30-9:40, to use your example, is not a good idea. in this case, there is a very real possibility that there is information that is new to the market that was injected between 4p the prior day and 9:30 this morning. further, even if there were not, the open and the close are special periods and the difference of a day might make a difference in investor trading patterns. one can imagine, for example, that a late selloff in a stock might trigger further selling based on margin calls the next day or an influx of value investors buying from momentum investors bailing out.

i like the idea of being able to choose business time vs. just using calendar time. i was not very clear about your proposal to do so.

hope that made sense.

Hello Thomas,

It seems that the trailing window should always contain the same number of tics (bars/events). As I see it, a simplistic approach is to ignore time gaps (nights/weekends/market closures) and to just treat the market as continuously sampled every minute (or every 24 hours). So, we'd have:

data[sid(24].mavg(tics=390)  

Then, every trading decision is based on the same trailing window size.

Is your question also meant to address the batch transform?

Grant

Thanks for the feedback everyone!

It seems that, to satisfy every use case, there should be 3 ways to specify a the window length of transform (either iterative or batch):

  1. passed trade-time (option I described at length at the top)
  2. number of bars (e.g. give me 10 minutes worth of data -> leads to constant window length irrespective of liquidity)
  3. pass absolute-time (this is in response to Grant, but I think this is lowest on the list as knuckledragger pointed out).

It might not be too hard to actually implement that now that I have a clear sight of what needs to be done.

Hello Thomas,

If I understand "1. passed trade-time" correctly, basically, you want to grab all of the tics available in the database for a specified trailing window of total "trading time" for a given security. However, given the Quantopian backtest tic-only data set, you have no way of knowing if the market was open during a given calendar minute for a specified security. Have you considered adding a boolean "market open" flag to the database? For every calendar minute (24/7/365) for every security, you'd indicate if the market was open. Does your vendor supply such data?

Regarding "2. number of bars," I suggest in the code, it should be explicit that the trailing window is in bars (i.e. "give me 10 bars worth of data" not "give me 10 minutes worth of data"). Then, there will be no confusion regarding trading versus calendar time, and it'll work under both the daily and minutely backtester modes.

Also, regarding "2. number of bars" and the batch transform, presumably both the bar data and their associated datetime stamps will be available, correct?

Regarding the batch transform, how is your work coming along to make it more efficient? And when do you expect to fix the problem with "filling holes" in the data, as described on the help page? I think that as a basic functionality of the Quantopian backtester, one needs to be able to get at the raw data efficiently to apply custom analyses. My recent thinking has been to avoid using any canned transforms provided by Quantopian and just write my own via the batch transform. This way, I know what's going on "under the hood" and I can customize the code (e.g. a weighted moving average).

Regarding "3. pass absolute-time" I wouldn't worry about this one. It is covered under "2. number of bars," since with access to the datetime stamps via the batch transform, a member can incorporate calendar (absolute) time into analyses, if necessary.

Cheers,

Grant

I agree with Grant that absolute time does matter depending on the use case, and it must be possible to access the timestamps inclusive. The majority of people will just be looking for the trade-time idea though.

Parameter names are important. For minutely data a 200 day average is bars=78000 but the same algo run with daily data it becomes a 300 year average. But unfortunately mavg(mins=20) doesn't make any sense with daily data either.

You could make it illegal to call mavg(mins=20) for daily data, but as of yet (?) there is no way to query the environment for the backtest frequency, so you end up coding specifically for daily or specifically for minutely.

I've been scratching my head a bit on how to handle spectral analysis of market data (e.g. FFT, cross-correlation, etc.). In this case, it seems that the datetime stamp is critical, right? Unless confined to intraday data, one has to deal with bursts of time series data. Rather than hack together some analysis, there should be a systematic way of addressing the burstiness--is there any precedent in the quantitative trading world?

Datetime stamp is critical for other analysis as well, like psychological ones. E.g. anything looking at time-of-day / time-of-year effects.

as for FFT and cross-correlation... not a clue! but I did ponder the idea of putting 200 day's worth of volume data into a 2D "image" (matrix), fill any gaps with the average, and then run a FFT on it and select the most prominent signals. Unfortunately I don't have the time for such things.