Notebook

When Can Sharpe Ratio and Cumulative Returns Have Different Signs?

In this forum post, Quantopian user SARAVANAKUMAR MUTHUSAMY posted an algorithm that has a positive Sharpe Ratio despite having negative overall returns.

In a subsequent forum post, Quantopian user Vladimir posted a simplified version of the algorithm that exhibits the same behavior.

In this notebook, we investigate when and why an algorithm might have a positive Sharpe and negative cumulative returns.

In [1]:
import math
import numpy as np
import empyrical
In [2]:
# This is the backtest from Vladimir's post.
backtest = get_backtest('59dd2f7dedf30053b71dc873')
100% Time: 0:00:00|###########################################################|

Calculated Risk Metrics

The BacktestResult class provides access to the cumulative risk and performance metrics computed by the backtester.

In [3]:
# risk is a dataframe of rolling risk metrics calculated for each backtest day, 
# as is cumulative_performance. We just want the last value of each.
backtest_sharpe = backtest.risk.sharpe.iloc[-1]
backtest_cumulative_return = backtest.cumulative_performance.returns.iloc[-1]

print "Sharpe:", backtest_sharpe
print "Cumulative Return:", backtest_cumulative_return
Sharpe: 0.314560288977
Cumulative Return: -0.0686423917799

How is Cumulative Return Calculated?

Cumulative return should be the percentage change of the total value of the algorithm's holdings over the lifetime of the backtest.

BacktestResult.capital_base tells us the initial capital of the strategy.

In [4]:
backtest.capital_base
Out[4]:
10000.0

BacktestResult.daily_performance.ending_portfolio_value tells us the total value of the portfolio at the end of each trading day.

In [5]:
backtest.daily_performance.ending_portfolio_value.head()
Out[5]:
2013-09-03 00:00:00+00:00    10009.915987
2013-09-04 00:00:00+00:00     9947.915987
2013-09-05 00:00:00+00:00     9967.915987
2013-09-06 00:00:00+00:00     9925.915987
2013-09-09 00:00:00+00:00     9812.515982
Name: ending_portfolio_value, dtype: float64

The total cumulative return is the percent change on the portfolio value between the start and end of the backtest.

In [6]:
starting_value = backtest.capital_base
ending_value = backtest.daily_performance.ending_portfolio_value.iloc[-1]

calculated_cumulative_return = (ending_value - starting_value) / starting_value
calculated_cumulative_return
Out[6]:
-0.068642391779903125

Our calculation matches the cumulative return calculated by the backtester.

(There's a tiny difference on the order of $10^{-15}$ due to floating point rounding error.)

In [7]:
calculated_cumulative_return - backtest_cumulative_return
Out[7]:
-1.8318679906315083e-15

An alternative way to derive the same result is to take the cumulative product of (1 + daily returns).

In [8]:
daily_returns = backtest.daily_performance.returns
daily_returns.head()
Out[8]:
2013-09-03 00:00:00+00:00    0.000992
2013-09-04 00:00:00+00:00   -0.006194
2013-09-05 00:00:00+00:00    0.002010
2013-09-06 00:00:00+00:00   -0.004214
2013-09-09 00:00:00+00:00   -0.011425
Name: returns, dtype: float64
In [9]:
(1 + daily_returns).product() - 1
Out[9]:
-0.068642391779904943

How is Sharpe Ratio Calculated?

The Sharpe Ratio shown by the backtester is defined in empyrical.

It is the calculation described as "Ex Post Sharpe Ratio" in The Sharpe Ratio, annualized for 252 periods (the number of trading days in a year) and assuming a zero risk-free rate.

For daily returns, the formula is:

$$\sqrt{252} \frac{mean(R)}{std(R)}$$

In [10]:
daily_returns = backtest.daily_performance.returns
daily_returns.head()
Out[10]:
2013-09-03 00:00:00+00:00    0.000992
2013-09-04 00:00:00+00:00   -0.006194
2013-09-05 00:00:00+00:00    0.002010
2013-09-06 00:00:00+00:00   -0.004214
2013-09-09 00:00:00+00:00   -0.011425
Name: returns, dtype: float64
In [11]:
calculated_sharpe = math.sqrt(252) * daily_returns.mean() / daily_returns.std()
calculated_sharpe
Out[11]:
0.31456028897743227

We again see agreement up to very small floating point error.

In [12]:
empyrical_sharpe = empyrical.sharpe_ratio(daily_returns)
empyrical_sharpe
Out[12]:
0.31456028897743255
In [13]:
backtest_sharpe
Out[13]:
0.31456028897743255
In [14]:
np.allclose([empyrical_sharpe, backtest_sharpe], calculated_sharpe)
Out[14]:
True

Why is Sharpe Ratio Positive When Cumulative Returns are Negative?

Since standard deviation is always positive, the sign of sharpe ratio is always the sign of the mean of average daily returns, so the sign of sharpe will differ from the sign of cumulative returns whenever

$$\sum\limits_{t}{R_t}$$

and

$$-1 + \prod\limits_{t}{(1 + R_t)}$$

have different signs.

A little analysis shows that these calculations will have different signs whenever the arithmetic mean of (1 + returns) is greater than 1, but the geometric mean of (1 + returns) is less than 1 (or vice versa), which is the case for the returns of this algorithm.

In [15]:
from scipy.stats import gmean
print("Arithmetic Mean:", (1 + daily_returns).mean())
print("Geometric Mean:", gmean((1 + daily_returns).values))
('Arithmetic Mean:', 1.0008993842870626)
('Geometric Mean:', 0.99992788093385609)

In general the geometric mean of a set of data points is lower when those points are more spread out, so the sign of cumulative returns is more likely to diverge from the sign of sharpe ratio the more dispersed the returns are.

In [16]:
hypothetical_returns = np.tile([0.01, -0.01], 126)
print "Arithmetic Mean:", (1 + hypothetical_returns).mean()
print "Geometric Mean:", gmean(1 + hypothetical_returns)
Arithmetic Mean: 1.0
Geometric Mean: 0.99994999875

The geometric mean decreases as we increase the spread of the up and down periods, even though the arithmetic mean stays flat.

In [17]:
more_dispersed_hypothetical_returns = np.tile([0.25, -0.25], 126)
print "Arithmetic Mean:", (1 + more_dispersed_hypothetical_returns).mean()
print "Geometric Mean:", gmean(1 + more_dispersed_hypothetical_returns)
Arithmetic Mean: 1.0
Geometric Mean: 0.968245836552

Looking at the plot of this strategies returns, we can see that there are a few large outliers in its returns.

In [18]:
daily_returns.plot(figsize=(14, 6));
In [19]:
(daily_returns + 1).cumprod().plot(figsize=(14, 6));

What Does This Tell Us About Sharpe Ratio?

The purpose of any statistic is to provide a summary of a dataset that captures important features of that dataset. The Sharpe Ratio attempts to summarize the attractiveness of an investment by measuring the ratio between the investment's expected return and the investment's expected volatility.

The standard method of calculating Sharpe Ratio tries to capture expected return by taking the mean of a series of observed returns, and it tries to capture expected volatility by taking the standard deviation of the same observed returns.

The explicit assumptions of these calculations are that:

  1. The mean and variance of an investment's returns distribution are sufficient statistics to evaluate the attractiveness of that investment.
  2. The historical mean and variance of the investment observed at some frequency are sufficient measures of the expected mean and variance of that investment.

Sharpe's original paper discusses these limitations explicitly:

On point (1):

Throughout, we build on Markowitz' mean-variance paradigm, which assumes that the mean and standard deviation of the distribution of one-period return are sufficient statistics for evaluating the prospects of an investment portfolio. Clearly, comparisons based on the first two moments of a distribution do not take into account possible differences among portfolios in other moments or in distributions of outcomes across states of nature that may be associated with different levels of investor utility.

When such considerations are especially important, return mean and variance may not suffice, requiring the use of additional or substitute measures. Such situations are, however, beyond the scope of this article. Our goal is simply to examine the situations in which two measures (mean and variance) can usefully be summarized with one (the Sharpe Ratio).

On point (2):

Most performance measures are computed using historic data but justified on the basis of predicted relationships. Practical implementations use ex post results while theoretical discussions focus on ex ante values. Implicitly or explicitly, it is assumed that historic results have at least some predictive ability.

In conclusion, on both points:

Clearly, any measure that attempts to summarize even an unbiased prediction of performance with a single number requires a substantial set of assumptions for justification. In practice, such assumptions are, at best, likely to hold only approximately. Certainly, the use of unadjusted historic (ex post) Sharpe Ratios as surrogates for unbiased predictions of ex ante ratios is subject to serious question. Despite such caveats, there is much to recommend a measure that at least takes into account both risk and expected return over any alternative that focuses only on the latter.

Sharpe Ratio can be a useful way of summarizing the attractiveness of an investment, but like any statistic it's important to understand its limitations. Having a positive Sharpe Ratio often, but not always, corresponds to having positive cumulative returns.

Review and Conclusions

In this notebook we:

  • Used get_backtest and the BacktestResult class to load and examine the cumulative performance and risk metrics for a backtest.
  • Learned how to re-derive the calculations that the backtester displays for cumulative return and Sharpe Ratio.
  • Learned that the sign of cumulative return and Sharpe Ratio will differ when the arithmetic mean of an algorithm's (1 + daily_returns) is greater than 1, but the geometric mean of (1 + daily_returns) is less than 1.
  • Discussed assumptions and limitations of the Sharpe Ratio as a measure of the attractiveness of an investment.