Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
CSV File For Minute Data Event Series Error

Hello!

Attempted to create a csv file for minute data per Eddie's suggestion:

Hi, Andy:
We have been putting some effort to making the Quantopian script syntax work in Zipline. (The first step was just added to master today https://github.com/quantopian/zipline/commit/b69590a2f709c70dd14d817d1a6bee0b1bb0e7b0).
For minute data, unfortunately a good public source of minute data is something that is currently missing from the ecosystem.
Yahoo! et al. only provide daily data.
If you have access to minute data, you should be able to use the data if you:
- Create a DataFrame source using your data set
- set the data_frequency of your Zipline algorithm to 'minute'
Hope that helps!
- Eddie

Following Eddie's suggestion:
1) created a minute intraday file named "AAPL-2014-01-09 14-30-00+00-00-2014-01-10 21-00-00+00-00.csv"
2) located in ".....zipline\cache\"
3) containing as follows:
Date,Open,High,Low,Close,Volume,Adj Close
2014-01-08 00:00:00+0000,546.8,546.86,535.35,536.53,1000000,536.52
2014-01-09 14:30:00+0000,546.8,546.86,535.35,536.53,1000100,536.52
2014-01-09 14:31:00+0000,546.8,546.86,535.35,536.53,1000200,536.52
2014-01-09 14:32:00+0000,546.8,546.86,535.35,536.53,1000300,536.52
2014-01-09 14:33:00+0000,546.8,546.86,535.35,536.53,1000400,536.52
..... 2014-01-09 20:55:00+0000,546.8,546.86,535.35,536.53,1038600,536.52 2014-01-09 20:56:00+0000,546.8,546.86,535.35,536.53,1038700,536.52
2014-01-09 20:57:00+0000,546.8,546.86,535.35,536.53,1038800,536.52
2014-01-09 20:58:00+0000,546.8,546.86,535.35,536.53,1038900,536.52
2014-01-09 20:59:00+0000,546.8,546.86,535.35,536.53,1039000,536.52
2014-01-09 21:00:00+0000,546.8,546.86,535.35,536.53,1039000,536.52
2014-01-10 00:00:00+0000,539.83,540.8,531.11,532.95,1000000,532.94
2014-01-10 14:30:00+0000,539.83,540.8,531.11,532.95,1000100,532.94
2014-01-10 14:31:00+0000,539.83,540.8,531.11,532.95,1000200,532.94
2014-01-10 14:32:00+0000,539.83,540.8,531.11,532.95,1000300,532.94
2014-01-10 14:33:00+0000,539.83,540.8,531.11,532.95,1000400,532.94
2014-01-10 14:34:00+0000,539.83,540.8,531.11,532.95,1000500,532.94
2014-01-10 14:35:00+0000,539.83,540.8,531.11,532.95,1000600,532.94
.... 2014-01-10 20:55:00+0000,539.83,540.8,531.11,532.95,1038600,532.94 2014-01-10 20:56:00+0000,539.83,540.8,531.11,532.95,1038700,532.94
2014-01-10 20:57:00+0000,539.83,540.8,531.11,532.95,1038800,532.94
2014-01-10 20:58:00+0000,539.83,540.8,531.11,532.95,1038900,532.94
2014-01-10 20:59:00+0000,539.83,540.8,531.11,532.95,1039000,532.94
2014-01-10 21:00:00+0000,539.83,540.8,531.11,532.95,1039100,532.94

4) Received Error as follows (line #'s a little different due to included print statements):

Traceback (most recent call last):
File "C:\afc_working\0_zipline\zipline-master\zipline\examples\buyapple_minute.py", line 57, in
results = simple_algo.run(data)
File "C:\Python27\lib\site-packages\zipline-0.6.0-py2.7.egg\zipline\algorithm.py", line 333, in run
for perf in self.gen:
File "C:\Python27\lib\site-packages\zipline-0.6.0-py2.7.egg\zipline\gens\tradesimulation.py", line 156, in transform
self.process_event(event)
File "C:\Python27\lib\site-packages\zipline-0.6.0-py2.7.egg\zipline\gens\tradesimulation.py", line 96, in process_event
self.algo.perf_tracker.process_event(event)
File "C:\Python27\lib\site-packages\zipline-0.6.0-py2.7.egg\zipline\finance\performance\tracker.py", line 279, in process_event
self.all_benchmark_returns[midnight] = event.returns
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 845, in setitem
raise KeyError('%s not in this series!' % str(key))
KeyError: '2014-01-09 00:00:00+00:00 not in this series!'

I have been researching this all the down into Pandas.tseries.index.DatetimeIndex, and still no additional insight.

Please HELP! :)

Thanks,
Andy

8 responses

Hi Andy,

(Note: This is an answer for local Zipline, not a script on the Quantopian IDE.)

I'd need to see your algorithm code to be sure, but this looks like you have both "data_frequency" and "emission_rate" set to minute; while still using daily benchmark data. An "emission_rate" of "minute" will create an index using market minutes not the midnight that the benchmarks are using.
If that's not the case, could you post a link of your algocode (a gist, or however else you want to share it) to [email protected], seeing exactly how you are setting up your algorithm would help get a better bead on it.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Eddie,

My stated objective was/is to generate orders on a minute basis, executed on-the-fly, NOT (1) per day, and NOT all all at once at the end of the day or postponed to the next day.

You are correct regarding both "data_frequency" and "emission_rate" set to minute. But, the reason being that simply setting "data_frequency" to minute did not result in minute orders, only (2) orders, one for each day of a (2) day minute csv file. The algo is simply the "buyapple" example modified as follows:

import matplotlib.pyplot as plt
from datetime import datetime
import pytz

from zipline.algorithm import TradingAlgorithm
from zipline.utils.factory import load_from_yahoo

import zipline.utils.factory as factory

class BuyApple(TradingAlgorithm): # inherit from TradingAlgorithm
"""This is the simplest possible algorithm that does nothing but
buy 1 apple share on each event.
"""
def handle_data(self, data): # overload handle_data() method
self.order('AAPL', 1) # order SID (=0) and amount (=1 shares)

if name == '__main__':
#start = datetime(2008, 1, 1, 0, 0, 0, 0, pytz.utc)
start = datetime(2014, 1, 9, 14, 30, 0, 0, pytz.utc)

#end = datetime(2010, 1, 1, 0, 0, 0, 0, pytz.utc)  
end = datetime(2014, 1, 10, 21, 0, 0, 0, pytz.utc)

data = load_from_yahoo(stocks=['AAPL'], indexes={}, start=start,  
                       end=end)

sim_params = factory.create_simulation_parameters( start=start, end=end, capital_base=10000 )  
sim_params.data_frequency = 'minute'  
#sim_params.emission_rate = 'minute' 

#simple_algo = BuyApple()  
simple_algo = BuyApple( sim_params = sim_params )

results = simple_algo.run(data)

ax1 = plt.subplot(211)  
results.portfolio_value.plot(ax=ax1)  
ax2 = plt.subplot(212, sharex=ax1)  
data.AAPL.plot(ax=ax2)  
plt.gcf().set_size_inches(18, 8)

The fabricated csv file (located in C:\Theano-0.5.0.zipline\cache) "AAPL-2014-01-09 14-30-00+00-00-2014-01-10 21-00-00+00-00.csv" portions of as follows:

Date,Open,High,Low,Close,Volume,Adj Close
2014-01-08 00:00:00+0000,546.8,546.86,535.35,536.53,1000000,536.52
2014-01-09 14:30:00+0000,546.8,546.86,535.35,536.53,1000100,536.52
2014-01-09 14:31:00+0000,546.8,546.86,535.35,536.53,1000200,536.52
2014-01-09 14:32:00+0000,546.8,546.86,535.35,536.53,1000300,536.52
2014-01-09 14:33:00+0000,546.8,546.86,535.35,536.53,1000400,536.52
xxx
2014-01-09 20:57:00+0000,546.8,546.86,535.35,536.53,1038800,536.52
2014-01-09 20:58:00+0000,546.8,546.86,535.35,536.53,1038900,536.52
2014-01-09 20:59:00+0000,546.8,546.86,535.35,536.53,1039000,536.52
2014-01-09 21:00:00+0000,546.8,546.86,535.35,536.53,1039000,536.52
2014-01-10 00:00:00+0000,539.83,540.8,531.11,532.95,1000000,532.94
2014-01-10 14:30:00+0000,539.83,540.8,531.11,532.95,1000100,532.94
2014-01-10 14:31:00+0000,539.83,540.8,531.11,532.95,1000200,532.94
2014-01-10 14:32:00+0000,539.83,540.8,531.11,532.95,1000300,532.94
2014-01-10 14:33:00+0000,539.83,540.8,531.11,532.95,1000400,532.94
xxx
2014-01-10 20:57:00+0000,539.83,540.8,531.11,532.95,1038800,532.94
2014-01-10 20:58:00+0000,539.83,540.8,531.11,532.95,1038900,532.94
2014-01-10 20:59:00+0000,539.83,540.8,531.11,532.95,1039000,532.94
2014-01-10 21:00:00+0000,539.83,540.8,531.11,532.95,1039100,532.94

Thanks,
Andy

Andy,

You are correct, that you need both of those parameters set to 'minute' for the behavior you desire.
I believe what you need then is to have the trading environment use a benchmark that is being updated every minute.
These two steps should work, (and apologies for not verifying first, I really should get a minutely CSV rig going as well, but don't have one handy at the moment.)
- Create a minute data source in the same fashion you did the AAPL source, "^GSPC" is one possible candidate, or "SPY".
- Add the following before the "run" call to your algorithm.

benchmark_data = load_from_yahoo('^GSCP', indexes={}, start=start, end=end)  
simple_algo.benchmark_return_source = benchmark_data  

Apologies again if that doesn't work out of the box, but it should hopefully get you closer.

  • Eddie

Hi Eddie,

Do you intend to mean that a local minute file located in ..zipline\cache will not work? I'm sorry, I must be missing your point in regards to your last answer. My understanding is that if the requested data is already present in a csv file located in "cache", that there is no additional download. So, why do you make reference to "load_from_yahoo('^GSCP', indexes={}, start=start, end=end)"? Please explain further.
Thanks,
Andy

Andy,

As I'm understanding it (and repeating back a little bit of what you just said), you're doing a clever hack where you are putting a CSV with minute data into the cache directory so that it will be used instead of the daily data pulled from Yahoo!.
Now that I look at it again, the extra calls of load_from_yahoo('^GSCP') and setting simple_algo.benchmark_return_source, would be unnecessary.
i.e. you should be able to replace "~/.zipline/data/^GSPC_benchmark.csv" with a CSV that uses minute data, and you should be able to get something going.

Hi Eddie,

I'm so sorry if I appear dense...but it appears to have gone full circle. What I did in my initial post appears to to be what you are suggesting, with the indicated results contained in the original post. I'm a little confused :( I agree that the problem appears to be that benchmark has only daily data, while the hacked AAPL csv file contains minute data. It is as if the events created are such that the benchmark is attempting to do minute updates while only having access to daily data. BTW, my objective has been specifically: a) Minute on-the-fly buy/sell/hold for AAPL (i.e. supplied csv file), and b) only daily updates of the benchmark, although I would not mind minute updates for the benchmark, but this is NOT my primary concern. I'm hopeful that you could help me understand what I am missing from your last post!
Thanks,
Andy

Hi Eddie,
I am having no success attempting your suggestion. Please explain further.
Thanks,
Andy

Andy,

Again, apologies for the run around/confusion.

Good news, I tried your algorithm and CSV locally using the current master of Zipline, and with only overwriting the AAPL cache file, the backtest does run to completion.

Bad news, I can't recreate your error, and I'm not sure what the difference between our environments is leading to the crash on one, and success on the other.

I just posted this thread to the Zipline Google Group, https://groups.google.com/forum/#!topic/zipline/HQcvnD3_Irg

(Also, attached to that post are the minute CSV and the algorithm file I used to try to recreate the crash.)

Hopefully, someone can debug this better than I can.