Contest results do not match backtest results?

Back to Community

posted

EDIT: Possible explanation (from Dan) is that since there are two different data sources (one for the live contest entry and one for the backtest) that there may have been a data error in one of the sources that would likely generate two different results.

The results for my latest contest entry (ranked #1 but has been running now for only 2 days) do not match a backtest (using the exact same code, date range, and settings)?

The contest entry results state that it has a P/L of $346.39 but the backtest results state a P/L of $316.72. Shouldn't these be the same!?

Since I use order_target_percent in my alogos I depend on total portfolio value each day to make new proportionate trades. If the cash position is different then the algo will make different trades accordingly. This is exactly what happened on the second day and as a result the trades from the contest entry do not match the trades from the backtest. The trades should have been identical both days.

Is anyone else seeing this? A quick and easy way to test is to take any contest entry from this month and run a backtest with the exact same code, date range, and settings; then look at the "Dollar P/L" from the contest header and the total "Gains" (from the backtest Daily Positions & Gains summary) and compare.

I am trading a basket of several stocks each day, many with potential dividends. Perhaps something is amiss with the cash calculations there? Are the commissions and slippages calculations different between the contest and the backtest? What else could cause this discrepancy? I'd be curious to hear from anyone that is trading several stocks (that may have dividends) each day with order_target_percent.

6 responses

Dan Dunn

The commission model and the slippage model are the same in backtest and live trading. What is different, though, is the price/volume data source. So the actual calculated slippage could be different for a given trade.

The live trading algo is using Nanex's NxCore product, and we're constructing minute bars on the fly from their as-traded data feed. The backtest is using a different data source that comes in nightly. For a variety of reasons (errors that are present in one feed and corrected in the other, for instance) they differ somewhat. For many trades the difference are essentially zero - for instance if you're buying a few shares of Apple, it just won't matter. But if you're buying a few thousand shares of a thinly traded stock, and the minutely volume is averaging around 500 shares, then the impact of your order can be much more significant.

When you look at your early trades, are the fill prices different?

If you want us to check in more detail, send links to the backtest and the live algorithm to [email protected]

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Rob "Plum Rhinoceros" Robinson

Thanks Dan, I'll check out the fills from the trades (on day 1) and report back!

Rob "Plum Rhinoceros" Robinson

Ok, I thought it was because of different execution times (earlier removed entry) but that was not the issue. After looking at several of the fills now, many are the same but some are slightly different and the quantities are slightly different as well (as a result of order_target_percent with varying prior fills I'm sure). So now the question is what fill caused all the other fills to go haywire? I don't have enough log data on the live entry to figure this out after the fact so I may not be able to find out. Since you explained that there are two different data sources this would explain the potential for this to happen. Thing is all of the trades were from the universe with the set_universe top 5% DailyVolumeUniverse so I doubt they were thinly traded securities. Probably just a data error in one of the data sources. Sorry for the alarm....

So here is a question: Do you compare the nightly data for the backtest with the daily live data from Nanex after the fact? Would this help "clean-up" the backtest data (after you determined if the Nanex was correct or vice-versa I assume)?

Dan Dunn

The short answer is, we expect the nightly data used in the backtester to be "more correct" than the data ingested by the live trading servers.

We've compared the nightly data to the live data on a sample basis, but we don't do it as a matter of regular practice. There's a whole can of worms of when you find a data problem, you have to figure out which one is right (or neither of them) and so on - it's a Sisyphean task. We rely on our backtest data vendor to do that for us. We're using a cleaner data source for backtesting.

This of course opens up the question, should we add a feature where you can do a backtest using the dirtier, "live" data? So you can make your algorithm more robust to live trading bad prints and other data sources? I think that yes, we should. It's not a high priority right now, but in the long run, I want you to be be able to test against "dirty" data, too, so you can have more insight into your algorithm's performance.

Disclaimer

Grant Kiehne

Hi Dan,

Under Q/IB paper/real-money trading, I understand that the OHLCV bar feed comes from Nanex Nxcore (technically, perhaps they give you all the trades, and your proprietary "injestor" creates the bars?). What happens when history() is called, though? Does it pull from the backtest data set (undisclosed vendor, which is kinda sketchy in the first place), or are the history() data also coming from Nanex?

Is there any code in your injestor (or maybe it is Nanex-proprietary?) to catch bad prints? I noticed Alisa's tip #13 on https://www.quantopian.com/posts/tips-for-writing-robust-algorithms-for-the-hedge-fund:

Protect yourself against bad data prints. Our data vendor, like all data vendors, sometimes passes us bad data. Those bad prints might cause an algorithm to place a trade that it shouldn't, or skip a trade that it would otherwise have made. Check if a price is outside of an interval - for example 10 standard deviations - before acting on the signal.

I guess my thought would be, why not filter at the injestor, where you have access to the full stream of trades? I have to figure there must be some common industry practices in this area. Would it be feasible? Or maybe you are already doing it, but some badness still leaks out? As a side note, you could consider open-sourcing your injestor architecture and code (assuming that you own it). Then the community could see what's going on, and potentially contribute to a solution.

Another consideration is that if I understand correctly, the injestor is creating minutely OHLCV bars by capturing every trade, and there is no smoothing. So all of the wildness that may be going on at high frequency ends up in the bars. At QuantCon, Ernie Chan presented a talk in which he discussed this problem (see https://www.quantopian.com/posts/quantcon-2015-replay-videos-and-presentations-available). So, if there is no smoothing of OHLC prices in an algo, then there are bound to be some differences, since the underlying high-frequency data are very noisy (volumes are a different matter, since they are cumulative over the minute). I think when you start to factor in the noisiness of the sampled high-frequency data stream (and consider that the reported timing of individual trades could have slop, too), then you end up fundamentally with no right answer, when comparing under-sampled data from different sources (even if the high-frequency data all come from the same source). Differences would be expected, and sometimes they could be large, depending the level of noise in the high-frequency data.

Grant

Market Tech

One will find that the only way to produce identical results between a back test vs the same trades does real time is to have recorded the ticks, all the ticks, and perform a wall clock accurate re-play of those ticks through an exact copied market model of your broker's execution engine. Theoretically one should be able to achieve identical results using limit orders of small quantities such that their impact is nonexistent on the market being traded. Otherwise you must accept that there will be differences. What is most important is the study of your real time trades to ensure that they are behaving as expected. Back test trades should always be viewed as an approximation (unless you've done what was just outlined).

You've successfully submitted a support ticket.

Our support team will be in touch soon.