In this forum post, Quantopian user SARAVANAKUMAR MUTHUSAMY posted an algorithm that has a positive Sharpe Ratio despite having negative overall returns.
In a subsequent forum post, Quantopian user Vladimir posted a simplified version of the algorithm that exhibits the same behavior.
In this notebook, we investigate when and why an algorithm might have a positive Sharpe and negative cumulative returns.
import math
import numpy as np
import empyrical
# This is the backtest from Vladimir's post.
backtest = get_backtest('59dd2f7dedf30053b71dc873')
The BacktestResult
class provides access to the cumulative risk and performance metrics computed by the backtester.
# risk is a dataframe of rolling risk metrics calculated for each backtest day,
# as is cumulative_performance. We just want the last value of each.
backtest_sharpe = backtest.risk.sharpe.iloc[-1]
backtest_cumulative_return = backtest.cumulative_performance.returns.iloc[-1]
print "Sharpe:", backtest_sharpe
print "Cumulative Return:", backtest_cumulative_return
Cumulative return should be the percentage change of the total value of the algorithm's holdings over the lifetime of the backtest.
BacktestResult.capital_base
tells us the initial capital of the strategy.
backtest.capital_base
BacktestResult.daily_performance.ending_portfolio_value
tells us the total value of the portfolio at the end of each
trading day.
backtest.daily_performance.ending_portfolio_value.head()
The total cumulative return is the percent change on the portfolio value between the start and end of the backtest.
starting_value = backtest.capital_base
ending_value = backtest.daily_performance.ending_portfolio_value.iloc[-1]
calculated_cumulative_return = (ending_value - starting_value) / starting_value
calculated_cumulative_return
Our calculation matches the cumulative return calculated by the backtester.
(There's a tiny difference on the order of $10^{-15}$ due to floating point rounding error.)
calculated_cumulative_return - backtest_cumulative_return
An alternative way to derive the same result is to take the cumulative product of (1 + daily returns).
daily_returns = backtest.daily_performance.returns
daily_returns.head()
(1 + daily_returns).product() - 1
The Sharpe Ratio shown by the backtester is defined in empyrical.
It is the calculation described as "Ex Post Sharpe Ratio" in The Sharpe Ratio, annualized for 252 periods (the number of trading days in a year) and assuming a zero risk-free rate.
For daily returns, the formula is:
$$\sqrt{252} \frac{mean(R)}{std(R)}$$
daily_returns = backtest.daily_performance.returns
daily_returns.head()
calculated_sharpe = math.sqrt(252) * daily_returns.mean() / daily_returns.std()
calculated_sharpe
We again see agreement up to very small floating point error.
empyrical_sharpe = empyrical.sharpe_ratio(daily_returns)
empyrical_sharpe
backtest_sharpe
np.allclose([empyrical_sharpe, backtest_sharpe], calculated_sharpe)
Since standard deviation is always positive, the sign of sharpe ratio is always the sign of the mean of average daily returns, so the sign of sharpe will differ from the sign of cumulative returns whenever
$$\sum\limits_{t}{R_t}$$
and
$$-1 + \prod\limits_{t}{(1 + R_t)}$$
have different signs.
A little analysis shows that these calculations will have different signs whenever the arithmetic mean of (1 + returns)
is greater than 1, but the geometric mean of (1 + returns)
is less than 1 (or vice versa), which is the case for the returns of this algorithm.
from scipy.stats import gmean
print("Arithmetic Mean:", (1 + daily_returns).mean())
print("Geometric Mean:", gmean((1 + daily_returns).values))
In general the geometric mean of a set of data points is lower when those points are more spread out, so the sign of cumulative returns is more likely to diverge from the sign of sharpe ratio the more dispersed the returns are.
hypothetical_returns = np.tile([0.01, -0.01], 126)
print "Arithmetic Mean:", (1 + hypothetical_returns).mean()
print "Geometric Mean:", gmean(1 + hypothetical_returns)
The geometric mean decreases as we increase the spread of the up and down periods, even though the arithmetic mean stays flat.
more_dispersed_hypothetical_returns = np.tile([0.25, -0.25], 126)
print "Arithmetic Mean:", (1 + more_dispersed_hypothetical_returns).mean()
print "Geometric Mean:", gmean(1 + more_dispersed_hypothetical_returns)
Looking at the plot of this strategies returns, we can see that there are a few large outliers in its returns.
daily_returns.plot(figsize=(14, 6));
(daily_returns + 1).cumprod().plot(figsize=(14, 6));
The purpose of any statistic is to provide a summary of a dataset that captures important features of that dataset. The Sharpe Ratio attempts to summarize the attractiveness of an investment by measuring the ratio between the investment's expected return and the investment's expected volatility.
The standard method of calculating Sharpe Ratio tries to capture expected return by taking the mean of a series of observed returns, and it tries to capture expected volatility by taking the standard deviation of the same observed returns.
The explicit assumptions of these calculations are that:
Sharpe's original paper discusses these limitations explicitly:
On point (1):
Throughout, we build on Markowitz' mean-variance paradigm, which assumes that the mean and standard deviation of the distribution of one-period return are sufficient statistics for evaluating the prospects of an investment portfolio. Clearly, comparisons based on the first two moments of a distribution do not take into account possible differences among portfolios in other moments or in distributions of outcomes across states of nature that may be associated with different levels of investor utility.
When such considerations are especially important, return mean and variance may not suffice, requiring the use of additional or substitute measures. Such situations are, however, beyond the scope of this article. Our goal is simply to examine the situations in which two measures (mean and variance) can usefully be summarized with one (the Sharpe Ratio).
On point (2):
Most performance measures are computed using historic data but justified on the basis of predicted relationships. Practical implementations use ex post results while theoretical discussions focus on ex ante values. Implicitly or explicitly, it is assumed that historic results have at least some predictive ability.
In conclusion, on both points:
Clearly, any measure that attempts to summarize even an unbiased prediction of performance with a single number requires a substantial set of assumptions for justification. In practice, such assumptions are, at best, likely to hold only approximately. Certainly, the use of unadjusted historic (ex post) Sharpe Ratios as surrogates for unbiased predictions of ex ante ratios is subject to serious question. Despite such caveats, there is much to recommend a measure that at least takes into account both risk and expected return over any alternative that focuses only on the latter.
Sharpe Ratio can be a useful way of summarizing the attractiveness of an investment, but like any statistic it's important to understand its limitations. Having a positive Sharpe Ratio often, but not always, corresponds to having positive cumulative returns.
In this notebook we:
get_backtest
and the BacktestResult
class to load and examine the cumulative performance and risk metrics for a backtest.(1 + daily_returns)
is greater than 1, but the geometric mean of (1 + daily_returns)
is less than 1.