Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Zipline/Quantopian : Major discrepencies using a cross-compatible code

Hi all,
My initial objective was to write a code that could be copy/pasted between Zipline and Quantopian, with the objective to use Linux/Spyder python IDE to write and debug code. The provided code addresses that objective.

Then I looked at the backtest results for any differences. Although I was expecting some, I've been surprised by the extent of such (see log below for example).

So I'm wondering, Am I doing anything wrong, is this the discussed differences between Quant-feed, and Yahoo ... and then what to make of it.

I've been using Yahoo for periodic strategies (periodic > 1 week rebalancing for tactical allocation strategies) with success. So I'm looking for convincing/rational info to be as confident with Q but currently my experience with yahoo tends to bring quite some bias in my thinking.

Any thoughts ?
Also ... any better way to display results in Zipline rather than the few lines of codes I've copied from https://www.quantopian.com/users/5369480afece9e06440000f6

quantopian:
2004-01-02PRINTDate 2004-01-02 00:00:00+00:00 Switch Nb: 1
2004-09-01PRINTDate 2004-09-01 00:00:00+00:00 Switch Nb: 2
2004-12-01PRINTDate 2004-12-01 00:00:00+00:00 Switch Nb: 3
2004-12-01PRINTDate 2004-12-01 00:00:00+00:00 CAGR = 0.0130861133
2005-04-01PRINTDate 2005-04-01 00:00:00+00:00 Switch Nb: 4
2005-08-01PRINTDate 2005-08-01 00:00:00+00:00 Switch Nb: 5
2005-12-01PRINTDate 2005-12-01 00:00:00+00:00 CAGR = 0.0372192950866

zipline:
Date 2004-01-02 00:00:00+00:00 Switch Nb: 1
Date 2004-05-03 00:00:00+00:00 Switch Nb: 2
Date 2004-09-01 00:00:00+00:00 Switch Nb: 3
Date 2004-12-01 00:00:00+00:00 Switch Nb: 4
Date 2004-12-01 00:00:00+00:00 CAGR = 0.015788
Date 2005-02-01 00:00:00+00:00 Switch Nb: 5
Date 2005-03-01 00:00:00+00:00 Switch Nb: 6
Date 2005-04-01 00:00:00+00:00 Switch Nb: 7
Date 2005-08-01 00:00:00+00:00 Switch Nb: 8
Date 2005-09-01 00:00:00+00:00 Switch Nb: 9
Date 2005-10-03 00:00:00+00:00 Switch Nb: 10
Date 2005-12-01 00:00:00+00:00 CAGR = 0.00133460940886

8 responses

I've been also looking into zipline vs. Quantopian and I also observed some difference in data between the two. I haven't read anywhere why so much difference, so I don't have a good answer for you. By the way, you can use get_environment function to figure out whether your algorithm is running in zipline or Quantopian.

The difference you're seeing is a common occurrence between the differences in data sources. From our FAQ:

Quantopian uses the last traded price as the close price for the security. Depending on the data source, others may use end-of-day (EOD) prices. For example, Yahoo is an EOD datasource. Yahoo and other EOD data providers get their price and volume data from the official exchange record. Quantopian's data is generated by the actual trades, regardless of what exchange the trade was made on. The EOD sources rarely exactly match data derived from intraday data. For instance, the official close for a NYSE stock is the last trade of the day for the stock on NYSE. But if the stock also trades on Chicago, Pacific or another regional exchange, the last trade on one of those exchanges could be our close.

Also, Quantopian's data is adjusted for splits and mergers, but does not use adjusted close-prices for dividends. Hope that helps to explain the differences!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Alisa, you should look into using the official exchange of the primary open/close for day bars or maybe offer it as a parameter. The liquidity on the other exchanges is typically much less then what you get on the primary. Using BAC on 12/8 as an example. NYSE had the largest closing volume. 128k traded at 17.66. If you looked at the last print at 16:00 only 200 shares make up the arca print at 17.67. It seems that the platform is flexible in bringing in other data however the big appeal of your ecosystem is that you offer data. Lightens the load for newish programmers like myself.

Alisa, Thanks for the reply and explanation;

If I were to use Yahoo price data, and make the most of their [close, adjusted close (accounting for splits & dividends), and dividends], and re-adjust close prices ONLY for splits, I should get closer to Quantopian results is that right ? This would be a great way for me to gain confidence in Zipline and Q, as I'm getting closer to live trading some of my strategies, and I do believe designing strategy offline is more efficient (specifically using multiple files/modules).

It appears also google finance close is adjusted for splits only. Could Quantopian provide access to GoogleFinance in addition to YahooFinance data ?

It's nice to be able to cross validate results in one OSS framework (Z) with another commercial one (Q) and I believe this would only strengthen Q.

Florent, the approach you mentioned would likely get you closer to the Quantopian data. But I'd warm that there there will may be some differences. As mentioned, our backtesting data is the aggregated trade data whereas Yahoo's may include data from pre- and post-market auction pools. All data sources are slightly different (even between Google, Yahoo, Bloomberg) and we get our data from a private vendor, which we then monitor, clean, and stream into the IDE. We'd love to make our data available offline for use in Zipline, for the reason you mentioned, but we can't redistribute the data per our agreement with the vendor.

When you're ready to port your strategy to Quantopian, I'd suggest to use the get_environment method. This will make it easier to move your code over to the IDE.

Thanks for taking the time to follow-up.
The strategy is ready, indeed posted above. I design in Zipline while making sure it's compatible with Quantopian. Anything not running in Zipline, I would never use in Quantopian as it lacks the benefit of my desktop IDE (linux/spyder) in terms of file management and debugging.

Fundamentally, I design everything in Z (1 strategy = multiple files ... the usual approach for maintaining a clean and robust environment), and use a script to assemble everything as a single file for Quantopian. That's the best approach I have found to maintain a proper code between different strategies, and prevent copy/pasta ! Right now, my aggregating code is not too robust, but eventually I'll make it better and share. I'm surprised there is no such things being part of Zipline-core.

On the data/performance comparison, I ll report and share with different data in the next few days/weeks. I believe instrument, with no splits and no dividends should provide some level of similarity, and even instruments with split-only should be in good agreements for strategies that have long-periodic analysis (like the one I've posted above).

Hi florent,

your research about "Zipline/Quantopian : Major discrepencies using a cross-compatible code" is very interesting,
Do you have any in advnace finding that could share to us.

Hi NT,
nothing new. I'm waiting on this to be merged: https://github.com/quantopian/zipline/pull/398, although I still think working up the Yahoo data, to keep the splits but remove the dividends and provide them as cash inflow at dividend dates (as done in Q I believe), while also enabling full stock return values as an option inside Zipline Algorithm would be the best way as some algo requires this info (based on full return), although only for the processing, as simulating feeding price should always be w/o dividends like everyday live trading.