Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Help Needed With Batch Transform

Hello All,

I don't fully understand batch_transform or I'm using it wrongly.

I'm trying to get my EMA calculation to match TA-Lib as a learning exercise. So I want to start with a 22-day window of prices for which I caclulate the simple average. But I can't seem to get the average of days 1 to 22, only days 2 - 23 and rolling.

To me it seems that on day 22 handle_data says we don't have a full 22-day window so there is no result until day 23.

This is a contrived example so please don't tell me better ways to calculate an average. My issue is understanding how to get a value on day t for a window of length t. I want to use the first 22 days of AAPL prices i.e. 2002-01-03 (Day 1) to 2002-02-04 (Day 22) but I only get the first average when I change the end date to 2002-02-05 (Day 23).

Regards,

Peter

35 responses

Hello Peter,

I don't have time now to fiddle with it, but I attached some code that might be useful to you. I'm not quite clear on the problem you are trying to resolve...

Grant

Hello Peter,

You might have a look at this line:

average = np.sum(prices[sid(24)][0:22] / 22)  

Seems that it should be:

average = np.sum(prices[sid(24)][0:21] / 22)  

Right?

Grant

Hello Grant,

Thank for the indexing suggestion. I thought [0:22] may be the right way to sum 22 elements:


a  
Out[42]: [3, 4, 5]

np.sum(a[0:2])  
Out[43]: 7

np.sum(a[0:3])  
Out[44]: 12  

The problem I am not explaining very well is that I want to have a calculation on a sliding window of t items available as the output from the batch_transform at interval t. It seems the event driven nature of zipline/Quantopian means that the result only becomes available in interval t+1. Something like:

" Orders placed in the handle_data invocation for bar T will be
processed by the model in T+1"

from https://www.quantopian.com/posts/my-overdraft. Maybe that's just the way it is.

Regards,

Peter

That comment clarified some things for me.

The "price" is the same thing as closing price. So on 2/4, during trading, you can't know the closing price on 2/4 - that would be look-ahead bias. On 2/5 you can have knowledge of the last 22 closing prices.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Also, just FYI, the code you have above is calculating the simple/arithmetic moving average, not the EMA (exponential moving average). Exponential moving average you don't even need a batch transform...

Hello Simon,

The example was simplified in order to ask the question and preclude the obvious answers.

(nulla) Quantopian data

03/01/2002 11.775
04/01/2002 11.845
07/01/2002 11.455
08/01/2002 11.32
09/01/2002 10.83
10/01/2002 10.615
11/01/2002 10.53
14/01/2002 10.575
15/01/2002 10.84
16/01/2002 10.39
17/01/2002 11.24
18/01/2002 11.085
22/01/2002 10.91
23/01/2002 11.505
24/01/2002 11.605
25/01/2002 11.65
28/01/2002 11.625
29/01/2002 11.535
30/01/2002 12.07
31/01/2002 12.36
01/02/2002 12.25
04/02/2002 12.675
05/02/2002 12.73

(i) talib

In [47]: talib.EMA(data['Close'][0:23],22)
Out[47]:
array([ nan, nan, nan, nan,
nan, nan, nan, nan,
nan, nan, nan, nan,
nan, nan, nan, nan,
nan, nan, nan, nan,
nan, 11.39477273, 11.51087945])

(ii) Excel

03/01/2002 11.775
04/01/2002 11.845
07/01/2002 11.455
08/01/2002 11.32
09/01/2002 10.83
10/01/2002 10.615
11/01/2002 10.53
14/01/2002 10.575
15/01/2002 10.84
16/01/2002 10.39
17/01/2002 11.24
18/01/2002 11.085
22/01/2002 10.91
23/01/2002 11.505
24/01/2002 11.605
25/01/2002 11.65
28/01/2002 11.625
29/01/2002 11.535
30/01/2002 12.07
31/01/2002 12.36
01/02/2002 12.25
04/02/2002 12.675 11.39477273
05/02/2002 12.73 11.51087945

(iii) Quantopian

2002-02-05PRINTSeed =
2002-02-05PRINT
2002-02-05PRINT11.4381818182
2002-02-05PRINTPRICE
2002-02-05PRINT
2002-02-05PRINT12.73
2002-02-05PRINTEMA =
2002-02-05PRINT
2002-02-05PRINT11.4381818182

Although (iii) is probably wrong now because I've been playing at trying to reproduce the biased answers but you get the idea.

Regards,

Peter

Peter,

My mistake on the indexing...works differently in MATLAB. Have you resolved your problem?

Dan,

I'm not clear on your comments above. For the attached algorithm, I get:

2002-02-05PRINT12.73  
2002-02-05PRINT[[ 11.845] [ 11.455] [ 11.32 ] [ 10.83 ] [ 10.615] [ 10.53 ] [ 10.575] [ 10.84 ] [ 10.39 ] [ 11.24 ] [ 11.085] [ 10.91 ] [ 11.505] [ 11.605] [ 11.65 ] [ 11.625] [ 11.535] [ 12.07 ] [ 12.36 ] [ 12.25 ] [ 12.675] [ 12.73 ]]  
2002-02-05PRINT11.4381818182  
End of logs.  

In my example above, is 12.73 the closing price for 2/5 or the prior market day? The datetime stamp of 2002-02-05 implies that 12.73 was the closing price on 2/5, right?

Grant

I'm happier now:

(i) Excel

01/02/2002 12.25
04/02/2002 12.675 11.39477273
05/02/2002 12.73 11.51087945
06/02/2002 12.335 11.58254210

(ii) talib

nan, 11.39477273, 11.51087945, 11.5825421 ,

(iii) Quantopian

2002-02-04PRINT11.39477273
2002-02-05PRINT11.51087945
2002-02-06PRINT11.58254210

Regards,

Peter

I'm glad that you could do that reproduction.

Any outstanding questions I can help with?

Dan

Hello Dan,

Thanks - I'm fine for now. I'm just finding it hard to think in terms of the 'event' model and dealing with - as it seems to me - results of some computations only being available in the time period after they are computed.

Maybe it's an age thing....my first attempts at programming were in BASIC on a teletype in 1978!

Regards,

Peter

The event model is different, and I don't think it's an age thing! Static analysis has a lot of uses and has served us well. It's just that backtesting is so well suited to an event model - it tears out look-ahead bias at the root. Putting static tools in an event model like this will make lots of sense once you get used to it.

(We're not that far apart - this was my first programming)

IBM XT

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Grant, replying to your question from 4/26.

Insert a new line 32

    print(str(datapanel['price']))  

The output starts

2002-01-04 00:00:00+00:00  11.845  
2002-01-07 00:00:00+00:00  11.455  

Which is to say, yes, the "datetime stamp of 2002-02-05 implies that 12.73" is quite correct.

All that said, it looks like the issue that Peter first brought up is something we should change. If we have the 22 data points, we should emit the frame - it's the close of the 22nd day, so there is no look-ahead bias. That change is forthcoming.

The other thing that is coming as a part of that is change to batch transforms making them rolling. (Thomas has alluded to this in other posts).

When those changes come out, it will have a moderately significant change to the behavior of batch transform. When we ship it, I'll post about the changes.

In general, I think data events and batch transforms should process "in time" for a market order that executes at open the next day. Good change.

Hello Dan,

Is there a Quantopian change log? If not could we have one? It doesn't have to be overly detailed or even that obvious - perhaps just a link from the API page.

Regards,

Peter

Hello all,

As I understand, in "daily" mode, we have access to the closing price, etc. during the current "event." When the batch transform is called during the current event, the trailing window includes the current closing price, etc. Any order placed during the current event should be fulfilled at the end of the next trading day (using the closing price for the next trading day), right? I say "should" since the order fulfillment could be delayed if the security does not trade every day.

All correct, or have I missed something?

Grant

Ah...now I understand...I had never noticed that the batch transform does not start returning values until one day after it has a full window of data. In effect, it drops the data from the first day of the backtest:

2002-01-03PRINT1  
2002-01-03PRINT11.775  
2002-01-04PRINT2  
2002-01-04PRINT11.845  
2002-01-07PRINT3  
2002-01-07PRINT11.455  
2002-01-08PRINT4  
2002-01-08PRINT11.32  
2002-01-09PRINT5  
2002-01-09PRINT10.83  
2002-01-10PRINT6  
2002-01-10PRINT10.615  
2002-01-11PRINT7  
2002-01-11PRINT10.53  
2002-01-14PRINT8  
2002-01-14PRINT10.575  
2002-01-15PRINT9  
2002-01-15PRINT10.84  
2002-01-16PRINT10  
2002-01-16PRINT10.39  
2002-01-17PRINT11  
2002-01-17PRINT11.24  
2002-01-18PRINT12  
2002-01-18PRINT11.085  
2002-01-22PRINT13  
2002-01-22PRINT10.91  
2002-01-23PRINT14  
2002-01-23PRINT11.505  
2002-01-24PRINT15  
2002-01-24PRINT11.605  
2002-01-25PRINT16  
2002-01-25PRINT11.65  
2002-01-28PRINT17  
2002-01-28PRINT11.625  
2002-01-29PRINT18  
2002-01-29PRINT11.535  
2002-01-30PRINT19  
2002-01-30PRINT12.07  
2002-01-31PRINT20  
2002-01-31PRINT12.36  
2002-02-01PRINT21  
2002-02-01PRINT12.25  
2002-02-04PRINT22  
2002-02-04PRINT12.675  
2002-02-05PRINT23  
2002-02-05PRINT12.73  
2002-02-05PRINT[[ 11.845] [ 11.455] [ 11.32 ] [ 10.83 ] [ 10.615] [ 10.53 ] [ 10.575] [ 10.84 ] [ 10.39 ] [ 11.24 ] [ 11.085] [ 10.91 ] [ 11.505] [ 11.605] [ 11.65 ] [ 11.625] [ 11.535] [ 12.07 ] [ 12.36 ] [ 12.25 ] [ 12.675] [ 12.73 ]]  

On a separate note, there is still something potentially wonky with the Quantopian backtester handling of daily data and orders. The daily data are not available until after market close (by definition, there is no closing price until after market close). So, in "daily" mode, the backtester is accepting orders after the market has closed, to be fulfilled at the closing price of the next market day. Is this a realistic scenario, under live trading with Interactive Brokers? As Simon points out above (assuming that an order can be submitted after hours), wouldn't the order get fulfilled first-thing the next morning? Or can the orders with Interactive Brokers be directed to use the closing price?

In summary, am I understanding correctly that in "daily" mode, the backtester accepts after-hours orders, and then fulfills them effectively as market-on-close (MOC) orders the following market day?

Grant

Hello Grant,

The prices for AAPL at the start of the data are:

OpenClose
01/03/200211.511.775
01/04/200211.6711.845


If I buy 1 AAPL share on 2002-03-1 the algorithm could:

(i) fill me at the opening of 3rd
(ii) fill me at the close of the 3rd
(iii) fill me at the opening of the 4th
(iv) fill me at the close of the 4th

It actually fills me at the close of the 4th. I think this is the worst of the four options

Regards,

Peter

Hello Peter,

Yep...the backtester requires one "event" to place an order and a follow-on "event" for it to be fulfilled. So, in daily mode, a full day has to pass before a submitted order is fulfilled.

Grant

Hello Grant,

Surely there is something wrong here? We have to make some assumptions somewhere so in the daily model why can't we assume we trade at the closing price for the day? Once the 'order' statement has been executed we would then have a price and a new portfolio position on that day. Where's the bias in that?

I don't want to say "if today's closing price is > yesterday's then buy", I want to say "sell at today's close, print the price and print the portfolio position."

Regards,

Peter

I think with daily-data backtests, they could allow MOC orders on the day-of. They definitely couldn't allow MOO orders the day-of, that would be a data-snooping problem. MOO day-after is what I would have expected from the start. MOC day-after, what they do, is (no more) safe, but extra penalizing for sensitive strategies.

I worked around this in some of my earlier strategies by only using minutely backtests and then testing my conditions at 15:58.

Peter,

In my opinion slippage is an important concept in backtesting. I wholly support the notion of orders placed in one event not filling until the next event.

But I agree that the slippage model could be enhanced to choose a random price from the OHLC quadruplet of the next event.

Dennis

Hmm well it should at least be customizable. If I am placing a 10,000 share order for SPY at the open, I don't expect it to slip to the closing price, I expect it to be filled in a few milliseconds.

While we're on the topic of slippage, the volume share slippage model might be appropriate for stocks, but I don't think it's appropriate for ETFs, where there is much more invisible liquidity via participating dealers. I always find myself turning off the volume slippage model when it thinks that thinly traded ETFs are illiquid.

Hello Simon,

Treating orders on daily backtests as next-day MOO feels much more right than treating them as next-day MOC.

Any thoughts on how to handle LOC orders?

Regards,

Peter

I think the only way to handle so that they are meaningful would be within a minutely backtest, for LOC orders placed during the day. From the perspective of a daily backtest, by the time the day's event hits handle_data, the close is already known, and has happened/is happening. Slipping in a Market-on-Close order is assuming that in production, one's entire system can process in the final minute of the day in time to slip an order in under the wire. In a daily backtest, if the closing price has happened/is happening, there's not any value in a Limit-on-Close order, since you technically already know whether it will or will not have (likely) executed.

My 2c.

When we provide live trading, it will be running on minute bar data. Obviously, one can still code a daily strategy using minute bar data, and I expect that many people will. But the question of how the order gets filled in real life in daily mode will be pretty much moot because the order will be filled in real life "in minute mode."

Peter, we don't have a formal change log. You can see all of the changes that are made to the guts of the backtester here, but I suspect that's a really low signal-to-noise ratio.

I make posts in the forums about new features and changes in old features. Obviously some changes don't get alerts (like last week, when I updated the FAQ to remove the word "beta" which I had forgotten to remove in January). On the other hand, when we whitelist a new Pyhon module we generally made a comment about it in here. We're particularly careful around anything that changes the results of backtests. We want backtests to be reproducible. We only make changes there when we're sure of them, and with an announcement.

Thanks Dan,

A few questions:

  1. What is the plan for bringing the backtester into alignment with the upcoming real-world trading at Interactive Brokers (IB)? Based on a variety of comments I've seen, there are some gaps that could result in misleading backtests. Or will you more-or-less leave the backtester as-is and have users rely on forward live paper trading to validate algorithms?
  2. Will live paper trading require an IB account? If so, will there be an associated cost and/or minimum balance?

Grant

Hey Grant

1) The backtester is mostly ready for live trading in terms of features. There are some order types (stop, limit, stop limit) coming. There is a change to some of the transforms and how they are calculated that is important. Is there something else you had in mind? We've got a great list of feature requests, of course, but I can't think of anything else that's a blocker.

2) There are two ways to do papertrading. The difference is in who "fills" the paper orders. Papertrading using Zipline to fill the orders doesn't require an IB account. Really, that is very similar to backtesting, but with live data rather than canned. The second way is to have IB "fill" the orders. Paper trading on IB requires money deposited for most people, but there are free accouts for students and such.

Thanks Dan,

Nothing in particular...I've just seen comments that some performance-impacting real-world trading details are difficult to capture in a backtester. If I come up with specifics, I'll let you know.

Grant

Hi Dan,

Perhaps this deserves a separate post, but I wanted to follow up on our discussion above. Might the Quantopian backtester be overly restrictive in only accepting order submissions during market hours, and only when there is a historical trade in the database for the security? Correct me if I'm wrong, but I figure that Interactive Brokers will accept orders anytime, but will fill them only when the market is open, and there is an opportunity to trade in the security. So, it would seem that the backtester should allow an order to be submitted 24/7/365, but only fill it when there is a historical bar in the database.

Grant

Hey Grant,

I think there is a limitation there, but I don't think it's that big. Exploring the ideas further:

1) For orders while the market is closed, there is no difference between an order placed at 4:31PM or 11PM or 9AM. So long as that order is placed before the market opens, it's all the same, right? Then, what would make an algo want to place an order at 11PM or 9AM? It would need some new information in order to trigger a new order. There are no trades during that time to provide new information. No new information, no trades. As our support of data sources gets more sophisticated we'll need to handle new, non-trade information accumulated in off hours. (Maybe run handle_data once, pre-market-open?). Until then, the algo trades whenever it has new information, which is effectively only during market hours.

2) Let's say you have a thinly traded stock, XYZZ. You have an algo that looks at only XYZZ. XYZZ doesn't get traded for 5 minutes, so the backtester skips handle_data 5 times. If I understood you correctly, that's concerning because you might have wanted to place orders for XYZZ in those 5 minutes. But, there is no new data in those five minutes - the algo has no data that would cause it to place a trade. I think that makes it a non-issue. Now let's say you have an algo that looks at XYZZ and SPY. SPY pretty much never has trading gaps. XYZZ doesn't trade for 5 minutes, but handle_data is called because SPY is present. Perhaps the change in SPY triggers an order in XYZZ. The algo places that order; the fact that XYZZ didn't trade is not an obstacle to placing the order. The backtester fills that order the next time XYZZ trades.

Side note: Thinly traded stocks are always a challenge, and our handle_data/slippage model is going to be better for some securities than others. As we go along we will refine them.

Thanks Dan,

Both good points. I agree that submitting (placing) an order at close is effectively the same as submitting it during off-market hours. Also, I did not realize, as you point out in your example, that the backtester would accept an order for XYZZ without an event in the window for XYZZ; an SPY event is sufficient. So, to allow orders to be submitted 24/7/365, you'd just need to add a dummy SID to the database that has an event every minute, right? Then orders could be submitted for actual securities every minute, regardless of the time of day. Correct?

Grant

Grant, you're right in principle, but we deliberately clip out data that is before and after trading hours in the current implementation; there's no SID in our database that you could find with 24/7/365 data!

We'll remove that limitation eventually. Forex, for instance, will require 24/7/365 support. But we're focused on live trading of US stocks first, and expanding the set of tradeable items later.

Hello All,

Following tonight's upgrade the batch transform now works as I originally hoped it would i.e. on the 22nd day I can get the average of 22 days. Thank you.

2002-01-03PRINT Price: 11.775  
2002-01-03PRINT Average = None  
2002-01-04PRINT Price: 11.845  
2002-01-04PRINT Average = None  
.
.
2002-02-01PRINT Price: 12.25  
2002-02-01PRINT Average = None  
2002-02-04PRINT Price: 12.675  
2002-02-04PRINT Average = 11.3947727273  
End of logs.  

Regards,

Peter

Glad you like it. Sorry for not catching that sooner. It's funny - you found it at approximately the same time someone else did in Zipline, and then it was a matter of pushing the fix through. It was linked to a lot of core issues in the code, and it took a while to get the fix out.