Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Why does the daily data in Quantopian not match Yahoo or Google Finance?

Just looking at the last 4 days of price data in the SPY and comparing it to Google Finance, Yahoo Finance and FreeStockCharts I find several errors or differences in Quantopain's data.

The open_price and the low are the same. But the close_price differs by anywhere from 3 cents to 15 cents from the other sources on all 4 days and the high price on 10-15-14 is reported by Quantopain as 186.88, I have 187.69 on the other sources.

What is going on here?

2 responses

I understand now why the close_price is different, but how come just over this small sample of the last 4 days one of the high prices is off by such a significant margin?

For the close prices, this is from the FAQ:
Why is your close price different from other data sources?
Quantopian uses the last traded price as the close price for the security. Depending on the data source, others may use end-of-day (EOD) prices. For example, Yahoo is an EOD datasource. Yahoo and other EOD data providers get their price and volume data from the official exchange record. Quantopian's data is generated by the actual trades, regardless of what exchange the trade was made on. The EOD sources rarely exactly match data derived from intraday data. For instance, the official close for a NYSE stock is the last trade of the day for the stock on NYSE. But if the stock also trades on Chicago, Pacific or another regional exchange, the last trade on one of those exchanges could be our close.

From my experience, I do remember the SPY trading after hours so I suppose the EOD sources included that recent after hours high as the actual high of the day, where as Quantopian get its data differently as mentioned above.

However, it still seems like these differences in data can make a significant difference in the result of an algorithm, especially given how frequent the differences in data seem to occur.

Which data source, for daily bars, is more correct then, in building an algorithm? Obviously Quantopian made its choice but have you ever considered how things would be different for your algo if you used EOD data and why should we not be using EOD data?