Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Significant 1-Minute Backtester Data Issue

I have found a troubling amount of data discrepancies at the 1-Minute Level. SIGNIFICANT differences, not off by a few pennies.

A few examples I have are:

March 24th, 2:34pm - SPY - low is 202.56, Quantopian low is 202.36
March 24th, 3:08pm - SPY - high is 202.76, Quantopian high is 203.22
March 24th, 3:42pm SPY - high is 203.08, Quantopian high is 203.22

These differences are not acceptable, and as you can see are quite significant. I am extremely disappointed as this renders the Quantopian backtester essentially useless for anyone trying to read prices intra-day.

The data I am referencing is from freestockcharts.com but if you look at the chart there you can see roughly the same picture from multiple soruces. The data from Quantopian has these massive differences on certain 1 minute bars which would keep showing huge spikes out away from the actual market only to quickly return back the next bar.

My guess is these "bad" 1 minute bars are the result of a single "bad" print that can occur. Often data providers will filter these "bad" prints as they are block prints which occur away from the current market and are then printed to the tape at a later time. However by including these prices in the 1 minute OHLC Quantopian is creating an intra-day picture of the market that is wildly inaccurate.

For example at 3:08 on March 24th you would think the market spiked up 50 cents and then returned back to its range, when in reality all that happened was one block print was printed to the tape at 3:08. By including these prints as part of the 1 minute ohlc you may lead people to believe they are getting fills that don't exist.

I am surprised I don't see more people talking about this. These are huge differences in price.

Has anyone else noticed this? Has anyone found a better data feed for the backtester?

Quantopian does not seem to share many details regarding their data vendor for backtesting, are they any closer to using their live data feed provider in the backtester?

I would love to use this platform, but the backtesting data provided is utterly worthless from an intra-day perspective. I hope you guys can shed some light on this issue and please fix it or point out why I am wrong.

Thank you

10 responses

There can definitely be some subtlety in knowing which trades to include in bar building, one doesn't want to include late-reported trades or perhaps even derivatively priced trades.

However, there are often bad trades which are subsequently cancelled, and whereas some historical vendors will back-correct bars to remove cancelled trades, that is something one generally does not want to do for strategy backtesting, since those trades really happened to the best of anyone's knowledge at the time.

Hard to say from the outside and/or without a tick data feed precisely which problem caused your bar discrepancies!

Thank you for your reply Simon.

These are not my bar discrepancies, this is Quantopian's problem.

Simply put the data vendor Quantopian is using for their free backtesting platform is providing an incredibly inaccurate picture of the market.

The size and frequency of the data discrepancies I have noticed just these past few market days make 1 minute backtests using their free data simply unreliable. I don't understand why this isn't seen as a bigger issue.

The problem with these "bad trades" is they are not fills that someone could get, they are large blocks of shares that have been negotiated and will trade away from the NBBO price. I see these prints all the time and any good charting platform doesn't plot the price when it is a block print away from the market. These prints are not attainable by traders or investors and they don't actually occur when they are printed to the tape. Much less are trades executed at any prices between the NBBO and the block print, which Quantopian's 1 minute data could lead you to believe.

For example a block print 50 cents away from the market like on March 24th at 3:08pm, not ignored, would lead you to believe the 1 minute candle actually traded all the way up to 203.33 when in reality only one print went off there and the market never really traded above 202.76 in that minute. Anyone looking for mean reversion fills or buys above that high would mistakenly think they are getting filled at prices that simply do not exists for them in reality.

Can someone from Quantopian please shed some light on the data they are using for their free backtesting? (outside of what the FAQ says)
Can I use a different data feed for the backtester?

This platform is pretty great and I fully endorse what Quantopian is trying to do, but this is a significant issue that needs to be addressed and can be fixed by simply switching out the backtest data provider. I saw someone from Quantopian posted I think 6 months ago that they will eventually switch to their live data vendor as the backtest data provider, any idea when that will occur?

Here is another example.

March 28th, 2:15pm in SPY.

Quantopian reports the low of 203.44 vs the low of 203.75

The lows of the 1minute bars preceding and after 2:15 are as follows.

203.73, 203.73, 203.73, 203.44, 203.76, 203.73, 203.73, 203. 70

This is a significant issue the needs to be addressed. The entire SPY did not spike down roughly 0.1% and spike right back up in the same 1 minute interval. By reporting data this way you are painting the picture as if it did. The backtest data vendor needs to be replaced or needs to filter out these block prints from the 1 minute OHLCV bars they create, its that simple.

I appreciate what Quantopian is doing and I really want this platform to be what it is trying to be, but this is a huge issue that needs attention. If the right decisions are made it can quickly be corrected. Plenty of data providers know and do correct for these kind of prints.

March 29th, 12:15pm in AAPL

Quantopian gives a low of 106.58. The low was 106.80.

Here are the 1-minute lows into and after 12:15pm.

106.80, 106.80, 106.82, 106.58, 106.78, 106.76, 106.81

Yes stocks can spike in a direction and quickly return. But yesterday at 12:15pm the market for AAPL did not spike down 0.2% and quickly return. This was a bad print that skews these 1 minute candles.

I am a bit surprised nobody wishes to acknowledge this issue. I have begun working on a way to filter these types of bars or at least alert me to potential bad data. But in reality since no one using this platform seems to care about this issue I will most likely have to use Zipline with my own dataset.

Yeah, it is a bit disappointing that they are not filtering out late-reported trades, if that is indeed what those points are, but by the same token, the high and low of a minute bar are pretty skeevy data for a trading system to rely on.

Simon I appreciate the response again but your opinion of what makes a good system is irrelevant to this discussion and I shouldn't have to elaborate to a community of "quants" why having the correct data is important.

I have found a solution that compares the wicks of one minute bars to the average bar length to at least alert me when these potential bad bars are present. It is far from ideal but I don't see any other possible way to protect yourself from Quantopian's bad data

That being said I am probably going to move away from Quantopian all together and implement a custom solution.

Of course correct data is important; I was merely pointing out to other "quants" including yourself that if a system is dreadfully impacted by the presence or absence of a single tick, it might not be all that robust a system. The nice thing about making a custom solution is that you can build better bars from the tick data, exactly how you want them.

Ahh I see your point now. The problem is these bad prints occur several times per day and since the backtester uses 1 minute data to simulate fills this can lead to a huge difference between backtester and what is actually happening in the market.

So one bad tick several times a day, every day or every few days and now we have a 1 year backest that is utterly worthless.

I just really dont understand why no one from quantopian has chimed in and no one else here has noticed this or seems to care.

Also since the 1 minute data is simply OHLCV, one bad tick now takes on a lot more significance because it can drastically alter the H or L of that 1 minute bar which the backtester or trader will infer contains trading between the bad tick and the actual market. In reality the market traded at that bad tick once in a block trade, at a price level that is not attainable by anyone participating in the open market.

However once you build a minute bar with that tick, a trader or backtester would likely assume trading activity at various price levels all the way to that block print, when in reality it never happened.

Since these occur everyday, and i have noticed several just these past few days without even looking for them, this is something that needs to be addressed.

One of the changes that came in Quantopian 2 was we changed data sources for the last few years of data. I'm sure that we fixed dozens of data errors with that change. I'm sure that there are still some data errors waiting to be found and fixed - data cleaning is never done.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.