We have some improvements and bug fixes coming to the backtester that are worthy of note. I expect these changes will come out later this week. When they do, existing backtests will get a visual flag that says that the algorithm should be re-run using the latest version of the backtester.
Several of these changes will have the effect of changing the results of backtests. This is something we take very seriously. We want our backtester to be trusted, and it's hard to trust a moving target. That said, when there is a significant improvement in accuracy, it's better to be right than consistent. I'm sure that this is not the last change we'll make to the backtester, but I want you to understand that we work very hard to minimize the changes.
The biggest change is to the way we calculate the Sharpe ratio. This change was very much driven by the community. You all saw something that you didn’t like, and we agreed with you! The calculation needed to be changed.
We're going to standardize our Sharpe calculation on daily returns that are annualized (with a few exceptions that will be obvious, like intra-day calculations on live trading). Today, our Sharpe is calculated using the absolute returns over the period specified. As an example, if we look at the Sharpe ratio for July, we're looking at the returns on July 31 minus the returns on July 1; if we're looking at the Sharpe for the last year, we subtract the result of 21-Aug-12 from 21-Aug-13. This method works well in a lot of situations, but it makes it difficult to compare the lifetime Sharpe ratio for algorithms if one algo has been running for longer than the other. When we roll out this change, we're going to fix that problem by taking the returns of each individual day over the period specified and then calculating an annualized Sharpe ratio.
If the period in question is a day, the returns over that day will be taken and annualized over a 252-day year. Similarly, the Sharpe over July will be calculated starting with the sum of the returns of 22 individual trading days, and then annualized. The Sharpe for a year will be calculated with the sum of the returns of the 252 different trading days in the year.
For the mathematically inclined, some more detail: yes, we are calculating the expected value of the daily returns divided by the square root of the daily variance of the returns as described in the formula in Wikipedia. That means to annualize a daily return, the result is multiplied by the square root of 252.
One of the lessons we learned from this process is that we need to be more explicit about how we calculate our ratios and risk metrics. The calculation is performed in Zipline, so anyone can see the calculation there, but not everyone can understand the Zipline code. So, we started sharing this Excel file. This Excel file is the "answer key" that we test Zipline against. We think that more people will be able to understand the Excel file. If you see anything in our answer key you think isn't correctly calculated, we'd like to hear about it so that we can find and fix it. Our Sharpe ratio calculation was following our answer key, but the community prompted us to review our calculation and revise the answer key.
The remaining changes have a smaller impact than the Sharpe change.
In mid-June, we improved the data quality in the feed for live and paper trading algorithms. We are considering re-running against this cleaner data. In a few cases that might cause the paper trading results to change for the days before the change rolled out in June.
In June and July we fixed bugs to our live and paper trading system. For some algorithms that were running before July 8, there are some weird data artifacts. In particular, the benchmark returns after the July 4th holiday were clearly incorrect, and this will correct the returns and any downstream risk metrics for the affected period. If your paper trading results are problematic and you'd like us to re-run your algo, please let us know.
In July, we fixed a bug in our data loader and expanded the set of stocks we had for backtesting. https://www.quantopian.com/posts/berkshire-hathaway-not-available With the additional data we rebuilt the DollarVolumeUniverse, which might result in slightly different outcomes in algorithms using set_universe.