Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Problems with data feed's prices

Hi All -

Below is a backtest that simply logs the price of the sector SPDR XLY. The log reads

2002-01-04handle_data:39DEBUGPrice of XLY: 29.15
2002-01-07handle_data:39DEBUGPrice of XLY: 29.13
2002-01-08handle_data:39DEBUGPrice of XLY: 29.0
2002-01-09handle_data:39DEBUGPrice of XLY: 28.6
2002-01-10handle_data:39DEBUGPrice of XLY: 28.75

However, these data do not match data that I have been using from another data feed (IQ feed) and do not match yahoo finance. Here is what I get from Yahoo finance (and from IQ Feed).

1/4/02 29.07
1/7/02 28.9
1/8/02 29
1/9/02 28.6
1/10/02 28.8

Anyone know what's up with this?

EDIT: I noticed this with a few tickers. :-/

10 responses

There are several sources of price differences that are "normal." Then there are data errors, and then there are bugs. I'm sure we have some from each category in Quantopian.

Normal:

  1. Yahoo, among others, adjusts prices using a different methodology. In particular they treat dividends differently. It's not that they are wrong; their method is perfectly acceptable. It's not well suited to backtesting, though, and that's what we've been optimizing for.
  2. Quantopian's data source is constructed from the as-traded, intraday trade feed. Yahoo and others use what's called an end-of-day source. I wrote about it recently: "Yahoo is an 'end-of-day' (EOD) datasource. Yahoo and other EOD data providers get their price and volume data from the official exchange record. Quantopian's data is generated by the actual trades, regardless of what exchange the trade was made on. The EOD sources rarely exactly match data derived from intraday data. For instance, the official close for a NYSE stock is the last trade of the day for the stock on NYSE. But if the stock also trades on Chicago, Pacific or another regional exchange, the last trade on one of those exchanges could be our close."
  3. Especially with some older OHLCV data, there is no good record! It's amazing the data that these companies threw out back in the day. There's a guy who built a business on the fact that he saved every CD that his data provider sent him every month. They deleted it all, he kept it. And then he sold his collection back to them! So depending on the source, some of the older data has different sources.

It's hard to know what scenario you're running into without studying. On the first one, you'll see older prices being very different, and the difference narrows as you get closer to present. On the second one, the prices will be off by a few pennies each day, moreso with thinly traded/big spread stocks. One the third one, it's hard to identify because the source of truth is obscured.

So, which flavor are you running into here? I don't know off the top of my head. Based on the 5 data points there I'd say it's probably door #2. The price differences look like .08, .23, 0, 0, and -.05. That's "not much" of a discrepancy, and doesn't have a pattern to it.

I'm always on the lookout for data problems. Obviously, the bigger they are the more time we can invest in tracking them down. At the moment, this one feels like it's in the noise.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Dan -

  1. The prices I quotes from Yahoo are not the dividend adjusted prices - they are closing prices. As far as which is better for backtesting, I would strongly argue that you need dividend adjusted data to run thru the algo and non-adjusted data to simulate the trade. Alternatively you can handle dividends paid within the algorithm. Either way, changes to share price resulting from dividends can not currently be included in Quantopian and that is the biggest hurdle I see right now.

  2. At the risk of revealing some level of ignorance, I thought all sector SPDRs were traded exclusively on the AMEX (or as its now known the NYSE MKT LLC). I could be mistaken and if so I apologize. I'll be digging deeper into this.

  3. Understood that "old" records are unreliable but I'm fairly certain that accurate records exist for trades in 2002.

I understand that this seems to be a minor discrepancy. Let me give a little more information. Before stumbling onto Quantopian, I was coding my own platform for algorithmic trading from the ground up. I essentially had the equivalent of zipline running (although my variable optimization was buggy but the backtester worked great). I had an algorithm that showed some promise and I started to study it in more detail. My program output was all text and so I had to write programs to read the output files and analyze the data! I abandoned my efforts there, recoded my algorithm in Quantopian and ran a backtest. I saw REALLY statisically significant results..... but the problem was the results did not match my backtester's results! (which I scrutinized and vetted thoroughly) I was happy that the Quantopian backtester indicated my algorithm was even better than I originally thought. But any good programmer knows, when there is a discrepancy you must sort it out before moving on.

So now I'm at a point where I'm trying to pull thru the two programs to find the discrepancy. My data feed is IQ feed, which seems to concur with yahoo. If you guys are confident in your data, so be it. I will follow the two sets of data to see if it accounts for the vastly different results and report back.

Thanks for answering so quickly.

Matching the official closing prices of the exchanges should be a goal of the system, the last time I dealt with SIAC data from NYSE/AMEX/etc (which was admittedly over ten years ago), it was nontrivial, but not difficult, to get systems to match. If they do not match, then that (to me) indicates there is a bug in the feed handling code somewhere.

That said, if you say that you are including all trades in your daily bars (even trades that are not suppose to set the "Last" price), that begs the question -- how do you close your daily bars? Which trade is selected as the final trade of the day? Does this include afterhours trades, or what used to be called Form-T trades (they are probably called something else these days)?

And as a followup question -- how are trade corrections handled in the system? I am not even sure if you would want to handle them in a backtesting platform, since live trading would have been whacked by the bad trades, and so everyone (or quantopian) will have to handle them in situ, which might be glossed over if they were applied to historical records...

Daniel, sorry, didn't mean to imply that you had dividends in your example there. I was trying for a comprehensive answer, even though it didn't apply to you. I have pretty strong faith in our data source, but as I said, I'm sure it does have errors. They all do.

Simon, you've been in the business a lot longer than I have. But I don't think that I agree that matching the official closing prices of the exchanges should be a goal. My goal is to get a backtester that is as effective at predicting actual algo behavior as reasonably possible. It's not clear to me that the price records from the exchange-of-record for an equity are any better than the as-traded prices compiled from multiple exchanges. I'm always open to further education!

The daily trade bars are closed at 4PM - there is no afterhour trading in our bars.

On the other questions: We have two different sources of data. One is used in backtesting. The other is used in papertrading and livetrading. The data source that is used in backtesting has been "cleaned" and bad trades are removed. The data source that is used in papertrading and livetrading is not cleaned - it's exactly as it comes over the wire. (Yes, I agree, we need to be able to backtest against the "dirty" data source. It's on the list!)

Well, I guess it's your prerogative to close your daily bars however you like, but you're giving up a rare opportunity to QA your data feed infrastructure against verifiable external data. Your daily bar close might be the price of the last valid trade prior to your chosen cutoff time, or it might be the price of the most recent odd-lot trade prior to your chosen cutoff time, or it might be the price of a cancel message, or it might be the price of the last trade two seconds prior to your chosen cutoff time, or it might be the price of an early after-hours trade, or it might be the price of a late-reported block trade from an hour prior...

Hi guys,

I did a quick skim through http://en.wikipedia.org/wiki/Dividend and found:

It is relatively common for a stock's price to decrease on the
ex-dividend date by an amount roughly equal to the dividend paid. This
reflects the decrease in the company's assets resulting from the
declaration of the dividend. The company does not take any explicit
action to adjust its stock price; in an efficient market, buyers and
sellers will automatically price this in.

So, the market determines the change in share price due to a dividend payment; there is no automatic adjustment (e.g. per a formula), correct?

Also, couldn't the dividend events be fed in with fetcher?

On a separate note, I'm kinda confused about the utility of a "clean" data source, if when going to paper/live trading, an algorithm will need to be tweaked/re-written to account for "dirty" data. Switching data sources between backtesting and paper/live trading seems kinda sketchy.

Grant

So, the market determines the change in share price due to a dividend payment; there is no automatic adjustment (e.g. per a formula), correct?

The dividend payment should be discounted back from the date it is paid and the past price adjusted such that the price difference reflects the TOTAL return, price return plus dividend payment. There are formulas to handle this but they are flawed. If you are asking whether someone adjusts the price of the security within the market, the answer is no. Securities trade on a free market at prevailing prices. No one has the power to adjust the price of a security by force. But consider this - if stock X pays a $2.00 dividend IF you are holding it by close today, and you value the share at $12 (including the dividend), then wouldn't you value it by approximately $2.00 less tomorrow when the dividend is no longer paid out?

Also, couldn't the dividend events be fed in with fetcher?

Absolutely! But as far as I can see, each individual ticker of interest requires a separate spreadsheet and web link to be fed into fetcher (unless I'm doing it wrong) and one would need to collect such a database.

On a separate note, I'm kinda confused about the utility of a "clean"
data source, if when going to paper/live trading, an algorithm will
need to be tweaked/re-written to account for "dirty" data. Switching
data sources between backtesting and paper/live trading seems kinda
sketchy.

The database cleaning is probably less of an issue. I'd probably be a little more comfortable if the official exchange closing price was used (so that I could verify the data feed with another source) but, all in all I think the backtester is fairly accurate. As you may have read from some of my other posts - I'm currently struggling to resolve some discrepancies between results obtained from Quantopian and another backtester (one I coded in-house). I'll be posting results of my analysis today or later this week, most likely.

Thanks Daniel,

I wonder what a dividend "event" looks like as it unfolds in the market? It can't be instantaneous, right? Everyone knows that the dividend has been paid (or will be paid on a certain date), but the price adjustment still has to play out in the market over time, correct?

This might be one reason not to apply corrections to the fundamental trade data to adjust for dividends.

Grant

@Grant - It is surprisingly quick (nearly instantaneous). Think of it like this - the purchase of a $12 stock comes with a $2 rebate up until the ex-dividend date. So on the ex-dividend date, you value the stock (post dividend) at $10, so you buy it for $12 and get your $2 rebate (dividend). The very next day the exchange opens, the dividend is no longer paid. You still value the stock at $10 and will now only pay $10. The stock should trade $2 lower as soon as the exchange opens the day after the ex-dividend date.

That said, I'm becoming more and more convinced that treating dividends the way Quantopian does is the correct way to do it. There are caveats, however. Some of which I'm beginning to uncover and post to the site.