Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
batch transform confusion

Please have a look at the attached implementation of the batch transform. My confusion is that with Apple, it seems that the batch transform does not accumulate tics as I'd expect. It shouldn't output until it has a full window of 15*390 tics, right?

When I use SPY, the behavior is as expected.

Any idea what's going on?

Grant

8 responses

Hello Grant,

With sid(24) and that start date batch_transform returns a DataFrame that is initially 5676 rows and whch then grows to the expected 5850 rows. With sid(8554) and that start date the DataFrame is 5850 rows from the beginning.

I've been trying to discuss this for a couple of weeks now but there is no feedback, I believe batch_transform in a minutely backtest with a 15 day window should ALWAYS return a DataFrame with 15 * 390 = 5850 rows with the currently empty rows comprising NaNs.

(Change the start date to 2013--01-07 with sid(24) and you also get 5850 rows from the start.)

P.

Thanks Peter,

Yes, that's what I see too. I'd naively thought that the refresh_period=0 the batch transform accumulates minutely bar data until the window is full, but it seems to depend on the start/stop dates of the backtest.

Grant

Peter,

I sent an e-mail to the Quantopian feedback address, alerting them of this thread.

Grant

Hi Guys,

I'm checking this out. A quick look shows that the batch transform is returning the same start/end time frame for AAPL and SPY (see log statements below from the first bar when the batch window is NOT NONE).

AAPL
2013-09-10PRINT DatetimeIndex: 5676 entries, 2013-08-20 13:31:00+00:00 to 2013-09-10 20:00:00+00:00

SPY
2013-09-10PRINT DatetimeIndex: 5850 entries, 2013-08-20 13:31:00+00:00 to 2013-09-10 20:00:00+00:00 Data columns (total 1 columns):

The end time in both cases is 15 trading days from the start time, so this seem consistent with specified window length. The discrepancy in the number of entries seems like it might be something weird about how we're filling in missing values in between. I will try to track down why the filling isn't behaving as expected and post back here when I have a good answer.

-Jess

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hello Jess,

Thanks. This may no longer be an issue once batch_transform is refactored (how? when?) but at present working with the (potentially variable length) output from batch_transform is frustrating as we would have to pad the DataFrame before attempting fairly simple operations like Grant's

pd.rolling_mean(minutely,390)  

P.

Hi guys -

We've tracked down why you're seeing this. As you say Peter - the desired behavior is to have consistent window length and that's how the behavior is explained in the help documentation.

What's happened here (which exposes a fragility in the batch_transform method) is that we had a non-standard trading day on August 22, 2013 (you may recall this: http://www.reuters.com/article/2013/08/29/us-nasdaq-halt-glitch-idUSBRE97S11420130829 ) where we accumulated only 216 minutes of the trading day for AAPL, and I believe all NASDAQ traded stocks.

Our 'padding' mechanism for filling in the batch_transform window turns out to be brittle in the sense that it will fill only when handle_data is called. In Grant's example passing in ONLY a stock on the NASDAQ, handle_data is not called when there is no trade data - and so no filling is done.

The short term fix for this is to pass the SPY in as a context stock as well. If you do this you will see the expected behavior for collecting data on AAPL in that case:

2013-09-10PRINT DatetimeIndex: 5850 entries, 2013-08-20 13:31:00+00:00 to 2013-09-10 20:00:00+00:00
Data columns (total 2 columns):
8554 5850 non-null values
24 5850 non-null values

In the longer term we will make sure to add this case to our spec for re-factoring access to trailing historical data. We definitely understand the use case for having a consistent window length to operate on.

Thanks for catching this issue and raising it!
-Jess

Hello Jess,

Thank you - I'm relieved that the issue has eventually been recognised. As soon as I find a day with missing data for SPY I'll let you know!

P.

Thanks Jess,

I think that another use case is returning N tics for each of M securities, without filling missing data and without NaNs. Thus, the datetime stamps won't necessarily line up for all tics, but each security will have a full window of data available. Seems like for thinly traded securities, you'll need something like this, otherwise there will be lots of filling/NaNs.

Grant