I have found a troubling amount of data discrepancies at the 1-Minute Level. SIGNIFICANT differences, not off by a few pennies.
A few examples I have are:
March 24th, 2:34pm - SPY - low is 202.56, Quantopian low is 202.36
March 24th, 3:08pm - SPY - high is 202.76, Quantopian high is 203.22
March 24th, 3:42pm SPY - high is 203.08, Quantopian high is 203.22
These differences are not acceptable, and as you can see are quite significant. I am extremely disappointed as this renders the Quantopian backtester essentially useless for anyone trying to read prices intra-day.
The data I am referencing is from freestockcharts.com but if you look at the chart there you can see roughly the same picture from multiple soruces. The data from Quantopian has these massive differences on certain 1 minute bars which would keep showing huge spikes out away from the actual market only to quickly return back the next bar.
My guess is these "bad" 1 minute bars are the result of a single "bad" print that can occur. Often data providers will filter these "bad" prints as they are block prints which occur away from the current market and are then printed to the tape at a later time. However by including these prices in the 1 minute OHLC Quantopian is creating an intra-day picture of the market that is wildly inaccurate.
For example at 3:08 on March 24th you would think the market spiked up 50 cents and then returned back to its range, when in reality all that happened was one block print was printed to the tape at 3:08. By including these prints as part of the 1 minute ohlc you may lead people to believe they are getting fills that don't exist.
I am surprised I don't see more people talking about this. These are huge differences in price.
Has anyone else noticed this? Has anyone found a better data feed for the backtester?
Quantopian does not seem to share many details regarding their data vendor for backtesting, are they any closer to using their live data feed provider in the backtester?
I would love to use this platform, but the backtesting data provided is utterly worthless from an intra-day perspective. I hope you guys can shed some light on this issue and please fix it or point out why I am wrong.
Thank you