Hi @Rafael,
@Ernesto, FYI
I'm fairly new here at Quantopian, but have been trading for a long time using completely different platform, language & database. I have seen quite a lot of examples of exactly this sort of problem, and it can affect any of the {O,H,L,C} data fields, especially in some markets such as HongKong, but also very occasionally with US, Canadian & Australian stocks. When it occurs, it is often seen on quite a lot of stocks on the same day. This is a clear indication that something is wrong with the database entries for that day and even other entries that look OK might also be suspect.
Checking out the individual fields in the database using an editor, i often see that when the problem occurs it is generally one of the following:
a) Anomalous price data on a no-volume day. i.e. the stock did not actually trade at all on that day, so any price data there is nonsense. Solution is to check if volume = 0 and then replace all price data for that day with the previous day's close.
b) Incorrect decimal point placement in one data field (e.g. 123.45 instead of correct value 12.345 as per surrounding data).
c) Apparent typo error in one digit in one data field (e.g. O= 12.34, H = 12.75, L = 92.07, C = 12.40) .
Presumably b) & c) are some sort of typo errors that occurred when the owner of the database was inputting the data.
If you can see these errors in a text editor while looking at your own local version of the source database, then the required correction is usually fairly obvious.
d) Other types of completely anomalous value in one data field on one day only. Presumably some sort of corruption of the database occurred. This can be harder to spot, but some fairly obvious data cleaning QC checks help: e.g. O & C should be >= L and <= H. Also H must be >= L, etc. Some decision is then required about exactly how best to correct this manually.
If in doubt about what to do, the safest thing is probably to replace the anomalous data with that of the previous day, while also checking for consistency between O, H, L & C. Definitely NOT a good idea to smooth, average or try to actually do any calculations at all with what is evidently bad data. Also don't interpolate between adjacent data points from the day before & after, as this leads to a look-ahead bias effect.