Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Why is backtest data off from other data sources?

I'm doing some research offline, and when I simulate a order on quantopian, I see that the price at the given time is wildly different from the price provided by AMEX or BATS.

Case in point: SPY @ 09:51 on 03-28-2014

AMEX data sources gives the following: o=185.92 h=185.92 l=185.84 c=185.87

quantopian gives the following: o=186.24 h=186.25 l=186.18 c=186.24

That's a difference of about 0.1% ( $ 0.37 / share)

why the difference?

4 responses

Hello Jason,

I don't know the answer to your question, but I'm curious where you got the AMEX and BATS data?

Per https://www.quantopian.com/faq#data:

We currently provide minute-level bar data of all US stocks from January 2002 through the previous trading day for backtesting...

For paper trading and real-money trading, we get a realtime feed of trades from Nanex's NxCore product. Those trades are bundled into one-minute bars and fed to the trading algorithms. Paper trading data is provided on a 15-minute delay. Real-money trading is processed without delay.

Also, are you using the Quantopian historical database (for backtesting) or the Nanex feed for your comparison? To my knowledge, Quantopian has never revealed their source for the historical data, so it is not clear that the backtest data will match (exactly) the Nanex-derived bar data.

You might consider if 0.1% is "in the noise" in the context of Quantopian's system, which, if I understand correctly, attempts to minimize latency, but does not guarantee that your order will be received at IB within a fixed sub-minute time frame (relative to the time at which the algorithm processes the Nanex-derived bar data, which has its own latency).

Grant

Hi Grant,

I'm using amex data from tradingview.com, i'm using that site to help me do my manual research, and wanted to test out some of my theoretical orders manually on quantopian before I try coding it up. (I'm still have a lot of work to do on my github intraday framework before i could use it)

i was using the quantopian data, not anything from nanex. so yeah maybe their backtest data is off. i suppose it should still work for an algo, but seems not accurate to the market

Hi Jason,

Do you know how their data is derived? Is the price adjusted for dividends and splits or is it the naked price?

Quantopian data uses adjusted close prices. It is adjusted for all stock splits and merger activities, but does not incorporate dividend data. During backtests, your portfolio receives dividends as cash payments.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

A follow up on this Alisa, I still believe this is an issue. as I mentioned earlier, I'm using intraday, and during the course of a single day (may 23rd in my example) the quantopian historical data varies significantly from the data provided by other vendors.

of course it's an issue with the backtest data being inaccurate, which isn't a bug in your system per-say, but maybe you could do some analysis on your backend to determine the extent of the discrepancy.