Hi, a basic question here. I'm using Zipline to run algorithms with custom data.
It seems several of the values returned in the results dataframe are mislabelled, namely benchmark_period_return, algorithm_period_return, and return. I found a comment stating something to the same effect somewhere in the bowels of the code in cumulative.py. Basically, 'benchmark_period_return' actually contains the cumulative returns, NOT the period returns. Similarly, algorithm_period_returns contains the algorithm cumulative returns, and NOT the period returns. Lastly, 'returns' contains the period percentage returns (unlike what the QT help on the portfolio object states: "cumulative percentage returns for the entire portfolio up to this point").
Is this documented anywhere? It seems like a pretty important issue that needs to be addressed. In fact, is there any documentation on the variables and calculations returned in Zipline?
# These are the adjusted close values for ^GSPC
In [122]:
gsp_adj_close = pd.Series([1218.890015,
1204.420044,
1173.969971,
1165.23999,
1198.619995,
1185.900024,
1154.22998,
1162.27002])
gsp_adj_close
Out[123]:
0 1218.890015
1 1204.420044
2 1173.969971
3 1165.239990
4 1198.619995
5 1185.900024
6 1154.229980
7 1162.270020
dtype: float64
# Calculate the percent gains
In [124]:
benchmark_pct_gains = gsp_adj_close.pct_change(1)[1:]
In [125]:
benchmark_pct_gains
Out[125]:
1 -0.011871
2 -0.025282
3 -0.007436
4 0.028646
5 -0.010612
6 -0.026705
7 0.006966
dtype: float64
# Calculate the benchmark cumulative returns. They are identical to benchmark_period_returns. Bad nomenclature!
In [126]:
empyrical.cum_returns(benchmark_pct_gains) - perf.benchmark_period_return.values
Out[126]:
1 0
2 0
3 0
4 0
5 0
6 0
7 0
dtype: float64
# Calculate the percent change in the portfolio values as returned. Compare to 'returns' column
# They are identical, while, according to QT help, 'returns' should contain cumulative returns,
# and NOT period returns
In [127]: perf.portfolio_value.values
Out[127]:
array([ 1000000. , 999727.67509375, 996169.87153552,
980632.62051827, 966563.23558839, 951299.16259529,
946291.48249317])
In [131]: perf.portfolio_value.pct_change().values - perf.returns.values
Out[131]:
array([ nan, 4.18502039e-17, -1.08420217e-17,
-3.29597460e-17, 4.16333634e-17, 5.20417043e-17,
2.68882139e-17])
# algorithm_period_return should be period returns, but is actually cumulative returns
In [139]: empyrical.cum_returns(perf.returns).values - perf.algorithm_period_return.values
Out[139]: array([ 0., 0., 0., 0., 0., 0., 0.])