Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Data Bundle Error - help

Hi Everyone,

I am trying to build a new data bundle for Thai Stock market to be used in back test on Zipline.
I register my stocks in .zipline/extension.py

Then, I create a new file in zipline/data/bundles/ to support the new data bundle which is called 'csv'. This will read local CSV files with 'OHLC' format like Yahoo data.

Finally, I ran 'zipline ingest -b csv', but I got the following error:

Could you please guide me how to fix the issue? Is it related with the number of lines in CSV files for each stock?

Thank you for your help.

[4833 rows x 6 columns])]

Now calling daily_bar_writer
Traceback (most recent call last):
File "/usr/local/bin/zipline", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1066, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, *ctx.params)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
return callback(
args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/zipline/
main_.py", line 312, in ingest
show_progress,
File "/usr/local/lib/python2.7/dist-packages/zipline/data/bundles/core.py", line 451, in ingest
pth.data_path([name, timestr], environ=environ),
File "/usr/local/lib/python2.7/dist-packages/zipline/data/bundles/viacsv.py", line 106, in ingest
daily_bar_writer.write(liData, show_progress=False)
File "/usr/local/lib/python2.7/dist-packages/zipline/data/us_equity_pricing.py", line 257, in write
return self._write_internal(it, assets)
File "/usr/local/lib/python2.7/dist-packages/zipline/data/us_equity_pricing.py", line 378, in _write_internal
).difference(asset_sessions).tolist(),
AssertionError: Got 3068 rows for daily bars table with first day=2002-08-09, last day=2017-07-07, expected 3754 rows.
Missing sessions: [Timestamp('2002-08-12 00:00:00+0000', tz='UTC'), Timestamp('2002-08-26 00:00:00+0000', tz='UTC'), Timestamp('2002-10-23 00:00:00+0000', tz='UTC'), Timestamp('2002-12-05 00:00:00+0000', tz='UTC'), Timestamp('2002-12-10 00:00:00+0000', tz='UTC'), Timestamp('2002-12-30 00:00:00+0000', tz='UTC'), Timestamp('2002-12-31 00:00:00+0000', tz='UTC'), Timestamp('2003-04-07 00:00:00+0000', tz='UTC'), Timestamp('2003-04-14 00:00:00+0000', tz='UTC'), Timestamp('2003-04-15 00:00:00+0000', tz='UTC'), Timestamp('2003-05-01 00:00:00+0000', tz='UTC'), Timestamp('2003-05-05 00:00:00+0000', tz='UTC'), Timestamp('2003-05-09 00:00:00+0000', tz='UTC'), Timestamp('2003-05-15 00:00:00+0000', tz='UTC'), Timestamp('2003-07-01 00:00:00+0000', tz='UTC'), Timestamp('2003-07-14 00:00:00+0000', tz='UTC'), Timestamp('2003-08-12 00:00:00+0000', tz='UTC'), Timestamp('2003-10-23 00:00:00+0000', tz='UTC'), Timestamp('2003-12-05 00:00:00+0000', tz='UTC'), Timestamp('2003-12-10 00:00:00+0000', tz='UTC'), Timestamp('2003-12-18 00:00:00+0000', tz='UTC'), Timestamp('2004-01-02 00:00:00+0000', tz='UTC'), Timestamp('2004-01-28 00:00:00+0000', tz='UTC'), Timestamp(................

7 responses

hi, did you solve your problem?

Not yet.

Thanks to Freddie and Delaney from Quantopain for help me out.

@Stephane, I hope you can follow the guides provided and sort your issues out.

Freddie suggested me as follows:

Zipline only supports US Equities (and broad support for futures), I'm not sure if the trading calendar associated with the Thai Stock Market will have all of the dates that Zipline expects from the NYSE Calendar.

There is currently a Pull Request open that implements a bundle that loads data from csv files into Zipline. You may want to use that branch that the person had made or compare your bundle to theirs in terms of implementation

If there are dates missing, in that your stock data has dates that the Zipline NYSE calendar does not have, then the best option I can see is that you backfill on those dates. There is another Pull Request that is not currently being worked on, but is open, that allows users to switch to different calendars. If there were a calendar you needed that was not there, you'd have to implement it using the Trading Calendar API. In regards to fixing the calendar issue, it looks like someone figured out a workaround for this issue in Zipline

Where you able to solve using those pull requests? I looked but it didnt seem it could solve it

Hi Toro, I had the same AssertionError you got 3 years ago for an IBEX35 stock, with the difference that I passed the correlative Trading Calendar (XMAD) to register, now available. This prevents me from ingesting the bundle in zipline. Did you find a convenient solution (other than manually backfilling on those dates? Thanks for your help, much appreciated.

Hi Isabel,
I have had a similar problem using bundles. As stated above, its due to missing data on those dates.
If you already have a calendar, then you can just sync your data to that calendar, and NA out the prices for the dates you don't have a price for (reindex does this automatically).
This works fine for zipline, as it assumes an NA on a date means there was no trading for that security on that date - which is probably why you don't have a price in the first place.
I download data into CSV files into a folder, then use this script to process all the files in the folder before putting them into the bundle
cheers,
Bruce

from zipline.utils.calendars import get_calendar
import pandas as pd
from collections import OrderedDict
import pytz
import glob
import os

from_file_path = "C:/Users/bvanston/Zipline/RawData/*.CSV"
to_file_path = "C:/Users/bvanston/Zipline/ASX200/daily/"
start_date = "2000-04-01"
end_date = "2020-07-13"

for fname in glob.glob(from_file_path):
destination_file = to_file_path + os.path.basename(fname)
print("Adding missing dates to {} and copying to {}".format(os.path.basename(fname),destination_file))
df = pd.read_csv(fname, index_col=0, parse_dates=['date'])
# Ensure the df is indexed by UTC timestamps
df = df.set_index(df.index.to_datetime().tz_localize('UTC'))
# Get all expected trading sessions in this range and reindex.
sessions = get_calendar('NYSE').sessions_in_range(start_date, end_date)
df = df.reindex(sessions)
df.to_csv(destination_file)

print("Finished")

Thanks Bruce, this has been of great help for missing values.
Yet, I have another problem: "TypeError: Expected data of type float64 for column 'ratio', but got 'object'."
First, there is no column 'ratio' in my csv file but the traditional columns for zipline ingestion (OHLC + volume + dividend + split).
But according to df.dtypes, my columns are floats but dtype is an object. I don't know if this has to do with the aforementioned error. I passed the function df.apply(pd.to_numeric) but nothing happens.
Do you have a clue on that? Sorry to disturb you one more time...
Thanks again,
Cheers, Isabel