Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Oddities in Price Examples

Hey,

I've been looking at the equity data provided by quantopian and noticed huge price jumps I can't explain. I am curious if these jumps are due to how the data has been adjusted. I attached a notebook demonstrating some of the "spike" oddities in the open and close daily price value dataset.

I've tried smoothing, but as you can see these jumps are just way too big to smooth over.

Does anyone have advice on how to handle these huge price jumps? What do they mean? Are these jumps artifacts from the adjustment method used?

4 responses

Hi @Rafael,
@Ernesto, FYI

I'm fairly new here at Quantopian, but have been trading for a long time using completely different platform, language & database. I have seen quite a lot of examples of exactly this sort of problem, and it can affect any of the {O,H,L,C} data fields, especially in some markets such as HongKong, but also very occasionally with US, Canadian & Australian stocks. When it occurs, it is often seen on quite a lot of stocks on the same day. This is a clear indication that something is wrong with the database entries for that day and even other entries that look OK might also be suspect.

Checking out the individual fields in the database using an editor, i often see that when the problem occurs it is generally one of the following:

a) Anomalous price data on a no-volume day. i.e. the stock did not actually trade at all on that day, so any price data there is nonsense. Solution is to check if volume = 0 and then replace all price data for that day with the previous day's close.

b) Incorrect decimal point placement in one data field (e.g. 123.45 instead of correct value 12.345 as per surrounding data).

c) Apparent typo error in one digit in one data field (e.g. O= 12.34, H = 12.75, L = 92.07, C = 12.40) .

Presumably b) & c) are some sort of typo errors that occurred when the owner of the database was inputting the data.
If you can see these errors in a text editor while looking at your own local version of the source database, then the required correction is usually fairly obvious.

d) Other types of completely anomalous value in one data field on one day only. Presumably some sort of corruption of the database occurred. This can be harder to spot, but some fairly obvious data cleaning QC checks help: e.g. O & C should be >= L and <= H. Also H must be >= L, etc. Some decision is then required about exactly how best to correct this manually.

If in doubt about what to do, the safest thing is probably to replace the anomalous data with that of the previous day, while also checking for consistency between O, H, L & C. Definitely NOT a good idea to smooth, average or try to actually do any calculations at all with what is evidently bad data. Also don't interpolate between adjacent data points from the day before & after, as this leads to a look-ahead bias effect.

Awesome! Thanks for your help.

Does quantopian fix these error and how do we know that the data we are given on the platform is correct? Are there comparisons to other datasets in order to indicate that I should trust the data? Otherwise, things just get really crazy fast if I can't trust the data.

Hi @Rafael, unfortunately bad data is generally more common than most people realize. Usually it is obvious within a day or so of when it occurs, but after that it can sometimes be very hard to spot and can play havoc with some algos. There are a few ways that can be used to trap these errors within your algos and make corrections to calculations, although i don't think i have the python skills yet to implement this sort of thing in my algos here in Quantopian. Obviously a better solution is to get the data corrected within Quantopian's databases or the source databases from which Q obtains the data.

@Dan, @Ernesto, passing this on to you for your info & comment.

We have a love/hate relationship with these errors.

On one hand, the errors really happen in the real world - every one of these examples is a time where our data vendor passed us data that was crazy. We are always improving our data quality, but the bug rate is never going to get to zero. So what does the algorithm do when it gets that error as an input? It's a good thing when an algorithm has some robustness built into it for situations like this.

On the other hand, some data errors really interfere with research. If we missed Apple's 7:1 dividend for instance, and never fixed it, it would be an obnoxious failure for lots of reasons.

For these particular failures: close prices get a lot more scrutiny than open prices. That BRK_A close is already in our bug database. CMF was a new one, and I wrote a bug. Not too surprisingly, you see the error in other data sources like Yahoo. These things tend to be widespread when they happen.

So can you trust Quantopian's data? Is it good? Yes, I think it's quite good. Is it perfect? Absolutely not! As Tony said, financial data is dirty. Every data source is imperfect. The good news is that Quantopian has 160,000 people looking at the data, and that makes for some serious crowd-driven improvement.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.