Back to Community

Backtesting Thoughts

posted

Have been lurking for a while and was thinking about backtesting in general.

As I understand it, Quantopian does backtesting against actual historical data. But for many trading strategies that could be quite misleading. Think about the space of all trading strategies for regular stocks (no options or futures etc). The trading strategies that are going to backtest well are the ones that pick the winners from the past. As other's have noted in the community, if a trading strategy simply bought apple it would backtest extremely well. For this simple example the bias is easy to spot, but for complex trading strats there could be hidden bias of this type. The problem is that those trading strats might completely fail on future data if they can't pick the new winners.

I am wondering if it is possible to eliminate this bias by backtesting in a different manner. What if you used the historical data to build a statistical model of the market or part of it. Such a model could include the risk free rate, historic volatilities, and correlations between stocks, etc. You could then run a monte Carlo simulation where you backtest against multiple, statistically generated historical data. You could use this to develop statistics about how the strategy works in a broader sense.

I don't think this back testing approach would work well for all trading strategies. I can can also imagine that this is equivalent to forward testing against statistically generated future data. So this might not be that useful after all. But somehow there has to be a way to backtest is a manner that eliminates these sources of problems. Thoughts?

13 responses

John Fawcett

@Brian, this is one of my favorite topics in backtesting. One of the things I like best about your idea is the underlying perspective on backtesting. In the industry, I always hear "I've never seen a backtest I didn't like". There's an inherent danger of overfitting when you rely on historical backtests. I still see historical backtests as immensely useful - especially as a thorough and functional test of your algorithm. If you run 10 years of data through an algorithm, it is very likely you will discover edge cases for your code. Historical tests are also great for evaluating algorithms in extreme circumstances, like October of 2008.

But, as much as I see value in historical backtesting, I agree it isn't a sure way to predict performance, nor is it a complete exploration of an algorithm's behavior. Statistically simulated data is another popular approach. What I've heard about is more like scenario analysis, where you see how your algorithm behaves under (usually distressful) situations. This is similar to historical backtesting through periods like the summer of 2007 and October of 2008, but you can specify the type of disaster you want your algo to endure -- perhaps a sudden change in correlations, or a broad market correction. Mixing this with your idea, you could run a chunk of data that approximates the statistical behavior of a real historical period, and then suddenly switch to a new regime.

Zipline can do this!

Practically speaking, Zipline datasources are implemented as generators (Zipline will be OSS in time for PyData). A trivial example is a sawtooth datasource that @Thomas wrote for our parameter optimization development. The only requirement is to yield dictionaries with the right properties (datetime, open, high, low, close, price, volume) and at the right frequency (daily or minute-ly). We designed Zipline with the expectation that new datasources would emerge once we started shipping the software, and specifically of synthetic sources like the one you propose.

Maybe there is a collaboration for us in creating this statistical source?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Brian Granger

@John this is great news that you have thought about these things already, even in the design of Zipline. That is fantastic. There are two related issues then:

Creating a statistical source to use in the backtesting.
Running a backtest with that source repeatedly to generate statistics.

Do you envision that Quantopian itself would design the statistical sources, or do you think that you would allow users to do that (or possibly both)? And would you consider allowing multiple backtest runs that are aggregated statistically? I could imagine that if Q were to design really good stat sources, it could be a huge competitive advantage and drive people to your site. I could also imagine that it would be nice to allow advanced users to tweak them.

One question: is your current backtesting against individual trades/orders (at possible irregular times) or batched to 1 minute intervals?

I don't think it would be too difficult to create some basic statistical sources...

John Fawcett

@Brian, thanks for the reply and the questions, you've made me think quite a lot!

I think we would want to provide statistical sources to be competitive, but ultimately I think people will want to provide their own too. I would want the quantopian stat sources to be part of the opensource zipline project. Like the slippage models, the sources are an area that would benefit from broad adoption and scrutiny.

We recently added configurable slippage and commission models, and one of the very first requests was for users to be able to supply their own implementation. I expect that will be a trend - people want all the key components to be pluggable.

For sources, I think something like this would be nice:

@data_source  
def my_data_source(universe):  
   # universe is the current universe of securities  
   # ... create trade events ...  
   for event in trade_events:  
       yield event

def initialize(context):  
    set_source(my_data_source)

We would absolutely allow for multiple concurrent backtests. As you can imagine, we do that a lot now, it is just done for several users, whereas you would want to trigger multiple concurrent backtests as a single user. It reminds me of the parameter optimization work Thomas is doing.

We use minute bars and daily bars, rather than individual trades. However, there isn't anything in the zipline structure that would prevent you from simulating tick level data, or even quote data. Events have types, so you can distinguish between trades and quotes, or news sentiment, etc. In many ways, statistically generating higher frequency data is the biggest payoff - the simulation time would be as good as possible since no data IO is needed.

Disclaimer

Brian Granger

Yes, this issue is definitely related to parameter optimization. If you are optimizing only against historical data, you are optimizing for the past = overfitting. Being able to optimize against statistical data sources would hopefully soften that.

I think the simplest type of statistical source would be to use the historical data for a set of SIDs to calculate the statistical properties required to do a correlated log-normal simulation. Don't know anything about how to simulate trading volumes though. Anyone know anything about that?

Thomas Wiecki

This is a fascinating thread.

@Brian: I agree that overfitting is a huge problem (although the walk-forward optimization helps with that somewhat). But in general, I think backtesting on historical data is but one way to increase your confidence in a trading strategy. Testing on simulated sources is another good way (if it is close enough to reality). One could e.g. imagine a whole battery of different market types (bear, bull etc) that an algorithm is tested on to see where room for improvement is likely to be. There seems to be a good amount of literature on how to simulate stock prices (like the lognormal model you mentioned, see here for a good introduction). The Ornstein-Uhlenbeck process also seems relevant, especially in the context of pair trading.

I don't know about simulating trading volumes. But couldn't one exploit the relationship between price and volume? In essence, if one has a good price model and a good handle on how this correlates with volume, one should be able to generate associated volume information.

Disclaimer

Joe

It seems to me that forward testing using Monte Carlo simulations of data sources built around the historical parameters of the source (volatility, etc.) is the way to go here. Distilling the relevant parameters in a useful way is difficult, I think, but once done it should slide into the simulator pretty easily, no?

Joe

Just to flesh out my comment a little further, the problem I see with John's suggestion above is that you can't really build a data source around measurements of historical performance since you don't really have access to them without computing them all yourself each time you run the simulation (as far as I can tell and remember I'm brand new at what you have here!).

Jay Deng

not sure if Monte Carlo would help or not-- either way we are making some sort of assumptions: using historical data= agreeing that past observation is the best measure; while using Monte Carlo, essentially we are assuming that there is certain prob. distribution/stochastic dynamics the stock market would follow.

To me it is really hard to say which way is better..

John Fawcett

@Jay, I agree. The idea in my mind is to gather multiple perspectives, rather than rely on one or the other as "correct" (or for that matter to dismiss one as irrelevant).

Disclaimer

Casey White

perhaps you can give the users the ability to compute average volatility between t0 and tn (of their defining) during a trading day over some time period.. like a moving average of volatility, and then add an error term that is composed of a gaussian white noise process to simulate volatility? you could probably even empirically define bounds for the white noise error term.....

disclaimer: i normally trade fx i have never traded equities, but this site is making me take a look at them because of the access to ide+data.... :)

Tim Meggs

@fawce – perhaps one way for the system itself to ‘dampen’ the effects of over-fitting would for it to automatically hold-out, say, the last 6 months of tick-data from that which is available to users as they build/optimize their strategies. When the user is ready to ‘commit’ their final strategy, not only is the strategy run (and results reported) against the historic-data set of their selection (as they are currently in the system), but it’s also run separately against the out-of-sample data that has been held back. Results could be reported separately for this. Just a thought…
(It wouldn’t of course stop people trying to optimize against the ‘held-out’ data, if they truly wished by iteratively submitting ‘final’ versions….)

John Fawcett

@Casey - sounds promising, and fun to build. You might like to look at extending the DataFrameSource: https://github.com/quantopian/zipline/blob/master/zipline/sources.py#L209

@Tim - Yes, we see the need to provide a clean in-sample/out-of-sample split on the historical trade data. Our current thought is that we should guide, but not force, users to take advantage of this split. On some level, an individual quantopian just has to be disciplined about not overfitting, but right now we don't provide any conveniences to define the split, honor it in development time, and give you a heads up when you are using the out of sample.

Disclaimer

Oleg Petrov

You can also get historical data from Interactive Brokers using IB Data Downloader tool:
http://www.tradinggeeks.net/downloads/ib-data-downloader/

You've successfully submitted a support ticket.

Our support team will be in touch soon.