Backtester Changes Coming, Sharpe Ratio Calculation Change

Back to Community

edited

We have some improvements and bug fixes coming to the backtester that are worthy of note. I expect these changes will come out later this week. When they do, existing backtests will get a visual flag that says that the algorithm should be re-run using the latest version of the backtester.

Several of these changes will have the effect of changing the results of backtests. This is something we take very seriously. We want our backtester to be trusted, and it's hard to trust a moving target. That said, when there is a significant improvement in accuracy, it's better to be right than consistent. I'm sure that this is not the last change we'll make to the backtester, but I want you to understand that we work very hard to minimize the changes.

The biggest change is to the way we calculate the Sharpe ratio. This change was very much driven by the community. You all saw something that you didn’t like, and we agreed with you! The calculation needed to be changed.

We're going to standardize our Sharpe calculation on daily returns that are annualized (with a few exceptions that will be obvious, like intra-day calculations on live trading). Today, our Sharpe is calculated using the absolute returns over the period specified. As an example, if we look at the Sharpe ratio for July, we're looking at the returns on July 31 minus the returns on July 1; if we're looking at the Sharpe for the last year, we subtract the result of 21-Aug-12 from 21-Aug-13. This method works well in a lot of situations, but it makes it difficult to compare the lifetime Sharpe ratio for algorithms if one algo has been running for longer than the other. When we roll out this change, we're going to fix that problem by taking the returns of each individual day over the period specified and then calculating an annualized Sharpe ratio.

If the period in question is a day, the returns over that day will be taken and annualized over a 252-day year. Similarly, the Sharpe over July will be calculated starting with the sum of the returns of 22 individual trading days, and then annualized. The Sharpe for a year will be calculated with the sum of the returns of the 252 different trading days in the year.

For the mathematically inclined, some more detail: yes, we are calculating the expected value of the daily returns divided by the square root of the daily variance of the returns as described in the formula in Wikipedia. That means to annualize a daily return, the result is multiplied by the square root of 252.

One of the lessons we learned from this process is that we need to be more explicit about how we calculate our ratios and risk metrics. The calculation is performed in Zipline, so anyone can see the calculation there, but not everyone can understand the Zipline code. So, we started sharing this Excel file. This Excel file is the "answer key" that we test Zipline against. We think that more people will be able to understand the Excel file. If you see anything in our answer key you think isn't correctly calculated, we'd like to hear about it so that we can find and fix it. Our Sharpe ratio calculation was following our answer key, but the community prompted us to review our calculation and revise the answer key.

The remaining changes have a smaller impact than the Sharpe change.

In mid-June, we improved the data quality in the feed for live and paper trading algorithms. We are considering re-running against this cleaner data. In a few cases that might cause the paper trading results to change for the days before the change rolled out in June.

In June and July we fixed bugs to our live and paper trading system. For some algorithms that were running before July 8, there are some weird data artifacts. In particular, the benchmark returns after the July 4th holiday were clearly incorrect, and this will correct the returns and any downstream risk metrics for the affected period. If your paper trading results are problematic and you'd like us to re-run your algo, please let us know.

In July, we fixed a bug in our data loader and expanded the set of stocks we had for backtesting. https://www.quantopian.com/posts/berkshire-hathaway-not-available With the additional data we rebuilt the DollarVolumeUniverse, which might result in slightly different outcomes in algorithms using set_universe.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

9 responses

Alex Heiden

Has zipline been updated with all of these changes? Do you know if the python package available through pip been updated?

Dan Dunn

When we're ready to pull the trigger, we'll merge the change into Zipline. The branch with the Sharpe changes is here.

Disclaimer

Simon Thornington

I'm a little unclear why you are subtracting the sum of daily treasury numbers - on the treasury 4.xx looks like an annualized yield-to-maturity recorded that day -- it looks like you are subtracting around 4% from every day's return to come up with your "sharpe mean difference".

I only looked into this briefly, but the spreadsheet at least looks incorrect.

Simon Thornington

If I were you, I would subtract the normalized treasury yield (ie: 4%) from the annualized returns (after accumulation/mean and annualization) then divide by the annualized standard deviation of raw returns.

Whether these should be arithmetic, geometric or logarithmic returns is a more subtle question, the above looks like a gross calculation mistake.

Dan Dunn

Thanks Simon, thanks for looking. We will give it another round.

Mole, those stats are on the 2nd tab of the backtest widget. Unfortunately we can't put everything on the first tab.

Disclaimer

Simon Thornington

http://www.edge-fund.com/Lo02.pdf

This is a good paper on the issues with Sharpe ratio, I think I have posted it before but I forget. FYI.

Dan Dunn

Mole, I'm sure it's not your fault - we're not making it clear enough.

In this screenshot, I'm talking about the middle arrow: http://screencast.com/t/xjrhhESoEVu

Does that have what you're looking for?

Disclaimer

Eddie Hebert

Simon, I did miss the units of annualized vs. daily on the risk free rate we are using from the treasury, I had assumed the treasury data was daily returns, but it clearly it is an annualized number. Mea culpa. Thank you for spotting that.

Simon, et al, if you have a chance could you review the changes to spreadsheet available at this link?
http://s3.amazonaws.com/zipline-test-data/risk/18542c61891edc12829b730cd2d3e3b1/risk-answer-key.xlsx

The changes are in the sheet "Sim Cumulative", columns M-R.

Granted, the results that only use a few points of data, e.g. cell R7, is a noisy result, but we included the results that have too few data points to show how the number stabilizes with more dates.

Python implementation to follow, but wanted to get some eyes on these equations ASAP.

Thanks!

Disclaimer

Simon Thornington

Took a look, the new columns seem okay, though the old ones are still there which is a bit confusing. You should validate your sharpe against other people's, so you can get a handle on how different people calculate these things.

You've successfully submitted a support ticket.

Our support team will be in touch soon.