Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
load_bars_from_yahoo values differ from quantopian values....

When I run tests on zipline, outside of Quantopian I get these results...

SIDData({'high': 143.25821922497602,  
               'open': 142.23879706969737,  
               'price': 143.17,  
               'datetime': Timestamp('2013-01-02 00:00:00+0000', tz='UTC'),  
               'volume': 192059000,  
               'low': 141.8663158975763,  
               'sid': 'SPY',  
               'source_id': 'DataPanelSource-7249b6b52a70adba5af10f6ec6ae7a34',  
               'close': 143.16999999999999,  
               'dt': Timestamp('2013-01-02 00:00:00+0000', tz='UTC'), 'type': 4})  

However when I run on Quantopian I get...

SIDData({'high': 146.11,  
               'price': 146.09,  
               'datetime': datetime.datetime(2013, 1, 2, 0, 0, tzinfo=),  
               'volume': 152841319,  
               'open_price': 145.11,  
               'low': 144.73,  
               'sid': Security(8554, symbol=u'SPY',  
                                      security_name=u'SPDR S&P 500 ETF TRUST',  
                                      exchange=u'NYSE ARCA EXCHANGE',  
                                      start_date=datetime.datetime(1993, 1, 29, 5, 0, tzinfo=),  
                                      end_date=datetime.datetime(2014, 1, 21, 5, 0, tzinfo=),  
                                      first_traded=None),  
               'source_id': 'AdjustedTradeSource320db307eabdf0e8f3f8a13e7bde3675',  
               'close_price': 146.09,  
               'dt': datetime.datetime(2013, 1, 2, 0, 0, tzinfo=), 'type': 4})  
Just The Values:  
                          Quantopian                zipline  
high                   146.11                       143.25821922497602  
open                  145.11                       142.23879706969737  
price                  146.09                       143.17  
low                    144.73                        141.8663158975763  
close                 146.09                        143.16999999999999  

If Quantopian uses zipline as the backend, where does this difference come from?

How can I correct for this? I am using zipline to train a system which shows promising results, however when I convert it over to Quantopian the data is no longer valid because the input data was trained on a different set of values.

6 responses

When I run load_from_yahoo adjusted=False I get...
Which is closer but still not exact. And the volume is different as well which I really don't understand.

SIDData({'high': 146.15000000000001,  
         'open': 145.11000000000001,  
         'price': 143.17,  
         'datetime': Timestamp('2013-01-02 00:00:00+0000', tz='UTC'),  
         'volume': 192059000,  
         'low': 144.72999999999999,  
         'sid': 'SPY',  
         'source_id': 'DataPanelSource-a47292cd016a9d0125214e40e7765e0c',  
         'close': 146.06,  
         'dt': Timestamp('2013-01-02 00:00:00+0000', tz='UTC'), 'type': 4})  

The differences you're seeing is likely due to the different data sources. If you test on zipline, you have to import your own data (from Yahoo! or another public source). On Quantopian.com, we use a private data vendor to provide our data. Our platform uses adjusted close prices, which may also contribute to any differences. https://www.quantopian.com/help#overview-datasources

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

For more information on Quantopian price data, see this thread: https://www.quantopian.com/posts/quantopian-price-data-actual-close-or-adjusted-close

Thanks. I knew I read about this here once but I couldn't find the thread.

While I can see why there might be different "adjustments" for the various data sources, doesn't it seem strange that the volumes are different?

Given that I need to use the same information that I will eventually be getting from Quantopian in order to train my system, is there a way to access that data? Or a way to write a file from a quantopian build?

Ahh, in the above thread...

The volume we use is volume aggregated from the several exchanges; the
close price we use is the last price across the exchanges, etc.

I feel like there needs to be a way to replicate the actual data that will be used on quantopian through zipline.

To further clarify, here are some additional details:

The historical pricing data in Quantopian is aggregated minute-bar trade data, which we further aggregate to create daily bars for the daily backtest mode. Intraday trade data will NOT sum to give you the 'official end of day' volume that you can find on Bloomberg, Yahoo, etc. There are a few reasons for this:

1) Intraday minute bar data does NOT include the opening/closing auction volume.
2) Our data provider covers the following exchanges: NYSE, NASDAQ, AMEX and all Consolidated Tape Association (CTA) participants. The official volume EOD number includes the exhaustive exchange list.

When you compare the 'first traded' and 'last traded' prices you can get with Quantopian to the official market open and close prices you should typically find a pretty close match. When you compare volumes (as you and others have noted) you will see wider discrepancies. This does not mean that the volume you get on Quantopian is 'wrong', it is the sum of traded volume (across the above listed exchanges) for minutes 9:31AM through 3:59:59 PM - which is also how our live trading currently works. So we are confident that our historical pricing and volume data lines up well with the data fed to live trading algos.

Hope that helps!