Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Quantopian's Sharpe ratio does not match MorningStar/Yahoo

The Sharpe ratio for SPY at Yahoo Finance and MorningStar over the past three years is 1.31, but Quantopian is coming up with 2.10.

http://finance.yahoo.com/q/rk?s=SPY
http://performance.morningstar.com/funds/etf/ratings-risk.action?t=SPY&region=USA

Also, MorningStar has 2.51 for Sortino vs. Quantopian's 27.99.

15 responses

Looks like they're using monthly returns:
http://corporate.morningstar.com/bf/documents/MethodologyDocuments/MethodologyPapers/StandardDeviationSharpeRatio_Definition.pdf

Part of the difference may be that Quantopian is I think using daily returns.

@ Ben - using daily vs. monthly returns will definitely generate a different value. Sharpe is better used as a means of comparing portfolios versus providing some absolute information about the portfolio. So, in other words, provided the Sharpe ratio is calculated in the same way for all portfolios, you can rank them by the computed value.

There may be a case where portfolios rank differently with daily vs. monthly data. In this case, you should think of the highest Sharpe ratio with daily returns as the portfolio with the best risk-adjusted daily returns and the highest Sharpe ratio with monthly returns as the portfolio with the best risk-adjusted monthly returns. I understand this is a little confusing. Similar to the Sharpe ratio is Roy's safety first criterion. If you examine the two equations you will see that the highest Sharpe ratio is the portfolio with the highest probability of bringing a premium to the risk-free rate. This could be on either a daily or monthly basis.

EDIT: I'm more concerned about the divergence between the algorithm and benchmark over time. Not sure why that happened.

@Daniel, yes I agree with everything you've said. I think it would be good for Quantopian to document that their Sharpe is calculated on a daily basis. Even better would be to provide both the daily and monthly Sharpe (since they're equal only if returns are i.i.d.). That would make it easier to compare with a Sharpe ratio published elsewhere that is calculated on a monthly basis. (Also, some people have been known to calculate both and use the one that is better to promote their trading strategy ;-)

@Ben - Agreed that transparency never hurts. I'm still playing around with the website overall but asking about the Sharpe ratio was on my to do list, because I've run into this issue before where Sharpe is being calculated differently based on who has published the data.

My recommendation would be to calculate the Sharpe ratio yourself if you are going to compare them. Note that some people use the term "monthly" to imply 21 trading days whereas others actually look at 1st of the month, 15th of the month, etc. The other variable is defining the risk-free rate of return. Do you use 10 year treasury? Libor? etc. To be sure the values are truly and completely comparable you're probably not going to be able to get out of calculating the values yourself.

One thing I'll eventually get to is a Sharpe function that computes Sharpe based on whatever I choose for the risk-free rate (either TLT, a fixed rate, imported list, etc.) and uses the time frame that I select.

There is something I really don't understand.
If the total return over 3Y is approx 63%, the yield is approx 17.7%. So if Quantopian measures a vol of 29%, the sharpe will be way below 1.

Even the vol number is strange, I compute 17.1% myself (since July 1st, 2010), so (with rates =0) the Sharpe is close to 1.

How can you get 2.10 ?

@Stef - Assuming that Quantopian is using daily returns, the annualized return of 17.7% would not matter. You would have to look at the expectation value for the daily return and the daily volatility. If you are looking at the "returns" and "volatility" printed out by Quantopian, you are only seeing monthly data.

EDIT: It would be interesting if you can find a way to extract daily return data you can do the calculation by hand (or rather in Excel) and see if the calculated Sharpe matches the printed value. You would need to extract the risk-free return as well.

The annualised return does matter, the definition of the Sharpe ratio is (yield-riskfreeRate)/vol, everything is annualised.
There is no expected value involved, since it is a measure, like return or vol, it's looking backward.

I DID extract daily data from Bloomberg, and I give numbers in my post (yield and vol 17ish, so yield/vol around 1, hence Sharpe <1).
I am not looking at total return and vol quoted by quantopian (although I claim that the vol is wrong), my point is just that I extracted the data, computed yield and vol and Sharpe on my side, and there is a problem with the 2.10 number.

Even if you forget stats and definitions, over a meaningful period of time, a Sharpe >1 for a risk asset is usually surprising.

This is a timely discussion, as one of our current focuses is improving/correcting the metrics mentioned in this thread.

(To follow along with the risk metrics, the calculations are done here, https://github.com/quantopian/zipline/blob/master/zipline/finance/risk.py)

One of the main corrections needed is that our risk calculations are tailored towards the 1/3/6/12 month reports. This means that our volatility currently has a denominator that is suited towards those time frames, but is not normalized to 252 days for the headline number, which is probably why the numbers are not as you expect @Stef42.

Likewise, the Sharpe ratio does not have a good calculation of the expected value against the risk-free rates, it should be using the mean of the difference and not just the difference between the two. (And as @Ben McCann has astutely pointed out we should convert the risk free to 10 year periods.)

We've also been working on getting an Excel spreadsheet, which we've used to corroborate the Zipline risk calculations, more integrated into the testing suite, as well as exposing said spreadsheet to the public for verification/scrutiny. We're hoping that having the two different implementations, and being able to verify each independently, will help tent-pole the correctness of the risk calculations.

Here is a link to the current version, https://s3.amazonaws.com/zipline-test-data/risk/3ac0773c4be4e9e5bacd9c6fa0e03e15+/risk-answer-key.xls

We're hoping to have some annotations for that Excel spreadsheet in the form of an IPython notebook, soon. The answer key is now being used to power the unit tests found here, https://github.com/quantopian/zipline/blob/master/tests/risk/test_risk.py. As we improve our risk calculations, we will be updating the spreadsheet accordingly.

Transparency of the backtester is a big goal of ours, including exposing how it works and its known deficiencies, but sometimes issues end up solely on the internal issue tracker. I'll double check to make sure these issues are on the external issue tracker, and if not, copy tickets about Sharpe, normalization, etc. over to https://github.com/quantopian/zipline/issues as well, and link to them from here.

(Also, @Daniel Sandberg, for your concern about the difference between the benchmark and the algo returns, the difference between the two is addressed here, https://quantopian.com/posts/algorithm-at-a-disadvantage, the TLDR; is that currently for backtesting, the benchmark on the graph is S&P, whereas the algo buys into the SPY ETF.)

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

(Also, @Daniel Sandberg, for your concern about the difference between the benchmark and the algo returns, the difference between the two is addressed here, https://quantopian.com/posts/algorithm-at-a-disadvantage, the TLDR; is that currently for backtesting, the benchmark on the graph is S&P, whereas the algo buys into the SPY ETF.)

@Eddie - yes the post you referenced was acutally started by me so I am familiar with that issue but did SPY really fail to track the S&P 500 by that much? The post you referenced (the one I started) was something of a fluke in that the day the backtest started the opening prices of SPY and S&P 500 differed by about 4% and that 4% discrepancy persisted thru out the simulation. In the backtest above the algo and benchmark actually diverge with time.

@Eddie that's great! just saw a lot of the changes you were referring to make it into zipline. I also have my own Java-based platform I wrote awhile back as well as NinjaTrader. I'm hoping to switch over to zipline at some point, so I've been doing some comparisons of zipline/Quantopian to my code and NinjaTrader to make sure there isn't anything that might be unexpectedly different if I switch. I've been raising most issues in zipline, but the risk stuff is tricky there, which is why I bring it up here. The main issue for me right now is that the period of time I can backtest over is limited until zipline supports the 10 year bond data. The second issue is that it's way slower than my Java code or NinjaTrader (being able to turn off the risk stuff or just compute it at the end instead of for every day might help?). On the plus side, which is why I'm sticking with it so far, is that Python is a really good language for this stuff with all the scikits-learn stuff, etc. It's also much better at looking at the performance of the entire system than NinjaTrader is. It's great for such a young project to already be competitive with what's out there already!

@Eddie Hebert
I don't really understand your normailsation point, if you normalise to monthly stdDev, you are talking about standard deviations, not volatility. The academic and industry practice/defintion is to use time unit 1=1year.
In your spreadhsheet, in sheet Sim, the volatility numbers cannot be compared since they are actually standard deviations, and returns are not yields, but since returns and std dev don't scale the same way, you can't compare Sharpe numbers across time frames.
And if you use non standard definitions, you might be internally constistant, but that's really confusing since you don't use dimensionless quantities.
It's like confusing time and speed. You might be consistant saying that you use different definitions to measure speed to go school and speed to go the beach, but how will you compare them ? OK you might measure that Joe is quicker to go to school than Bill, but you nothing else.

In the end it's just a question of definition, but I think that using non standard definitions and non dimentionless quantities is confusing and error prone.
That's why we use implied vol and not option prices, yields and not bond prices, hit rate and not number of successful trades, ....

BTW, in your spreadsheet, you use actuarial returns to compute vol/stdDev, not log returns.
It's just a question of defintion but ...

The difference bewteen the SPY / benchmark return is the the total return effect.
The 63% return for SPY for this period is a total return performance (using price adjusted data, which is fine).
However the 56% for the benchmark is actually the price return for the index, not the total return, hence the difference.

The difference is just the div yield.

See this for SPY/SPX basis
http://tinypic.com/view.php?pic=wlac6p&s=5

I've added a bug to the zipline issue tracker to track the fact that the Sharpe ratio calculation is using the difference of the cumulative returns and not the mean of the difference between the portfolio return and benchmark return as it should be.
https://github.com/quantopian/zipline/issues/202