minute backtest not working with research platform

Back to Community

posted

Hi, I tried to do backtest in the research platform but it does not work.

To reproduce the problem, I changed "daily" to "minute" in the tutorial named "Tutorial (Advanced) - Backtesting with Zipline", and changed the body of function "initialize" and "handle_data" to "pass". The minute data was loaded correctly, however, when running "perf_manual = algo_obj.run(data.transpose(2,1,0))", it crashed.

--------------------------------------------------------------------------- IndexError Traceback (most recent call
last) in ()
13
14 # Run algorithm
---> 15 perf_manual = algo_obj.run(data.transpose(2,1,0))

/home/qexec/src/qexec_repo/zipline/algorithm.pyc in run(self, source,
overwrite_sim_params, benchmark_return_source)
471 # perf dictionary
472 perfs = []
--> 473 for perf in self.gen:
474 perfs.append(perf)
475

/home/qexec/src/qexec_repo/zipline/gens/tradesimulation.pyc in
transform(self, stream_in)
109 date,
110 snapshot,
--> 111 self.algo.instant_fill,
112 )
113 # Perf messages are only emitted if the snapshot contained

/home/qexec/src/qexec_repo/zipline/gens/tradesimulation.pyc in
process_snapshot(self, dt, snapshot, instant_fill)
249
250 if any_trade_occurred:
--> 251 new_orders = self.call_handle_data()
252 for order in new_orders:
253 perf_process_order(order)

/home/qexec/src/qexec_repo/zipline/gens/tradesimulation.pyc in
call_handle_data(self)
278 self.algo,
279 self.current_data,
--> 280 self.simulationdt,
281 )
282 orders = self.algo.blotter.new_orders

/home/qexec/src/qexec_repo/zipline/utils/events.pyc in
handle_data(self, context, data, dt)
192 def handle_data(self, context, data, dt):
193 for event in self._events:
--> 194 event.handle_data(context, data, dt)
195
196

/home/qexec/src/qexec_repo/zipline/utils/events.pyc in
handle_data(self, context, data, dt)
210 """
211 if self.rule.should_trigger(dt):
--> 212 self.callback(context, data)
213
214

/home/qexec/src/qexec_repo/zipline/algorithm.pyc in handle_data(self,
data)
275 self.history_container.update(data, self.datetime)
276
--> 277 self._handle_data(self, data)
278
279 # Unlike trading controls which remain constant unless placing an

in handle_data(context, data)
22 # history() has to be called with the same params
23 # from above and returns a pandas dataframe.
---> 24 short_mavg = history(100, '1d', 'price').mean()
25 long_mavg = history(300, '1d', 'price').mean()
26

/home/qexec/src/qexec_repo/zipline/utils/api_support.pyc in
wrapped(args, **kwargs)
49 def wrapped(*args, **kwargs):
50 # Get the instance and call the method
---> 51 return getattr(get_algo_instance(), f.__name__)(args, **kwargs)
52 # Add functor to zipline.api
53 setattr(zipline.api, f.__name__, wrapped)

/home/qexec/src/qexec_repo/zipline/algorithm.pyc in history(self,
bar_count, frequency, field, ffill) 1010 ffill, 1011
)
-> 1012 return self.history_container.get_history(history_spec, self.datetime)
1013 1014 ####################

/home/qexec/src/qexec_repo/zipline/history/history_container.pyc in
get_history(self, history_spec, algo_dt)
916 digest_frame,
917 self.last_known_prior_values,
--> 918 raw=True
919 )
920 last_period = self.frame_to_series(field, buffer_frame, self.sids)

/home/qexec/src/qexec_repo/zipline/history/history_container.pyc in
ffill_buffer_from_prior_values(freq, field, buffer_frame,
digest_frame, pv_frame, raw)
55 buffer_values = buffer_frame.values
56
---> 57 nan_sids = pd.isnull(buffer_values[0])
58 if np.any(nan_sids) and len(digest_values):
59 # If we have any leading nans in the buffer and we have a non-empty

IndexError: index 0 is out of bounds for axis 0 with size 0

6 responses

James Jack

I've just been trying to get round this... it seems to happen when there is NaN in the data.

Try this:

data = get_pricing(  
    ['SPY'],  
    start_date='2014-10-29',  
    end_date = '2014-10-31',  
    frequency='minute'  
)

which contains some non-trading entries

# print indexes where SPY had a NaN price (i.e. didn't trade)  
trans = data.transpose(2,1,0)  
k = trans.keys()[0]  
idxs = trans[k]['close_price'].index[ np.isnan(trans[k]['close_price']) ]  
for idx in idxs:  
    print idx

which gives:

2014-10-30 17:09:00+00:00  
2014-10-30 17:10:00+00:00  
2014-10-30 17:11:00+00:00  
2014-10-30 17:12:00+00:00  
2014-10-30 17:13:00+00:00  
2014-10-30 17:14:00+00:00  
2014-10-30 17:15:00+00:00  
2014-10-30 17:16:00+00:00  
2014-10-30 17:17:00+00:00  
2014-10-30 17:18:00+00:00  
2014-10-30 17:19:00+00:00  
2014-10-30 17:20:00+00:00  
2014-10-30 17:21:00+00:00  
2014-10-30 17:22:00+00:00  
2014-10-30 17:23:00+00:00  
2014-10-30 17:24:00+00:00  
2014-10-30 17:25:00+00:00  
2014-10-30 17:26:00+00:00  
2014-10-30 17:27:00+00:00  
2014-10-30 17:28:00+00:00  
2014-10-30 17:29:00+00:00  
2014-10-30 17:30:00+00:00  
2014-10-30 17:31:00+00:00  
2014-10-30 17:32:00+00:00  
2014-10-30 17:33:00+00:00  
2014-10-30 17:34:00+00:00  
2014-10-30 17:35:00+00:00  
2014-10-30 17:36:00+00:00  
2014-10-30 17:37:00+00:00

and then replace those entries with 0

# replace NaN is zero (so you can hopefully detect it in your algo)  
for sid in trans:  
    NaN_Mask = np.isnan(trans[sid]['close_price']);  
    for column in trans[sid]:  
        trans[sid][column][NaN_Mask] = 0  # set open,high,low,close,volume = 0  
algo_obj = TradingAlgorithm(  
    initialize=initialize,  
    handle_data=handle_data  
)
perf_manual = algo_obj.run(trans)

which seems to make the error go away.

probably needs a better long-term solution, of course.

hope it helps!

Ning Q

Thanks! It must be the problem. I'm using dropna to handle it (you may also want to try fillna):

data.dropna(axis=1, inplace=True)

James Jack

That's probably quicker ;)

Unfortunately there are still some snags to get minutely backtests working in research.

Q devs: We need the sim parameters factory from zipline!

Here's another nasty hack to work around it in the meantime:

class SimParams(object):  
    ''' This is an absolute hack at copying what is in zipline. try not to use it.  
    '''  
    def __init__(self, trans, start_date_txt, end_date_txt, money):  
        DAYS = {}  
        for dt in trans[trans.keys()[0]].index:  
            DAYS[dt.date()] = 1  
        self.period_start = datetime.datetime(int(start_date_txt[0:4]),int(start_date_txt[5:7]),int(start_date_txt[8:10]), tzinfo=pytz.utc)  
        self.period_end = datetime.datetime(int(end_date_txt[0:4]),int(end_date_txt[5:7]),int(end_date_txt[8:10]), tzinfo=pytz.utc)  
        self.last_close = trans[trans.keys()[0]].index[-1].to_datetime()  
        self.first_open = trans[trans.keys()[0]].index[0].to_datetime()  
        self.days_in_period = len(DAYS)  
        self.capital_base = float(money)  
        self.emission_rate = 'minute'  
        self.data_frequency = 'minute'  
        self.arena = 'backtest'  
    def _update_internal(self):  
        start_index = tradingcalendar.trading_days.get_loc(pd.Timestamp(self.first_open.date()))  
        end_index = tradingcalendar.trading_days.get_loc(pd.Timestamp(self.last_close.date()))  
        self.trading_days = tradingcalendar.trading_days[start_index:end_index + 1]

Some of that is probably not even correct. I can feel a menacing crowd of pythons hissing at me right now.

So then I've been doing:

start_date = '2006-01-01'  
end_date = '2014-04-10'  
data = get_pricing(  
    ['SPY'],  
    start_date,  
    end_date,  
    frequency='minute'  
)
trans = data.transpose(2,1,0)  
# replace NaN with zero (so you can hopefully detect it in your algo)  
for sid in trans:  
    NaN_Mask = np.isnan(trans[sid]['close_price']);  
    for column in trans[sid]:  
        trans[sid][column][NaN_Mask] = 0  # set open,high,low,close,volume = 0

and finally..

params = SimParams(trans, start_date, end_date, 10000.0)  
algo_obj = TradingAlgorithm(  
            initialize=initialize,  
            handle_data=handle_data,  
            sim_params=params  
)
# run it:  
perf_manual = algo_obj.run(trans)

There's probably a better way of doing this, and one which is less likely to explode without warning.

Marcus Williamson

Hi all,

Is there any update on this from any other thread/discussions outside this post? Minutely data in the research environment is really needed for optimising minutely level algorithms and I cant seem to get anything working. Very frustrating!

Has anyone heard from @Quantopian about this, have they got a fix in the pipeline?

Marcus

James Christopher

Minute level backtesting is already available in Zipline in Research. Perhaps I'm not understanding the issue, but I've noticed none of the posted code above includes the parameter data_frequency='minute' in the TradingAlgorithm constructor. Here is a link to the Github page that contains the source for the TradingAlgorithm object.

algo_obj = TradingAlgorithm(  
    initialize=initialize,  
    handle_data=handle_data,  
    data_frequency='minute'  
)

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Marcus Williamson

Hi James,

I didnt realise the data_frequency needed defining in the algo_obj, thanks for this! This was my issue I believe.

Thanks,

Marcus

You've successfully submitted a support ticket.

Our support team will be in touch soon.