Tearsheet Analysis of Algo Performance in our Research Environment

Here is the algo/backtest that the above notebook references so that you can have access to it when cloning the notebook.

-Justin

Disclaimer

Simon Thornington

I wonder if it would be helpful to augment the in/out-of-sample distribution comparisons with a Kolmogorov-Smirnov test? I don't know much about it, but I recall seeing a presentation wherein it seemed useful.

@Simon: Thanks for proposing the KS-Test. I've just done a little digging around to learn about it, and it certainly feels like it could be a useful addition to our testing procedure for analyzing backtest vs. out of sample performance.

Disclaimer

Its fantastic research notebook!
I wonder the Notebook whether could analyze Zipline algorithm backtest result ,
Have suggestion how to do it on Zipline platform?

I clone the notebook and modify the tearsheet on a single stock's to 'EEM', why got these error responsed ?

You can also easily run a tearsheet on a single stock's or ETF's daily returns timeseries
In [15]:

stock = securities_panel.loc['price']['EEM'].dropna()
stock_rets = stock.pct_change().dropna()
In [16]:

analyze_single_algo( df_rets=stock_rets, algo_live_date='2014-1-1', cone_std=1.0 )
Entire data start date: 2003-04-14 00:00:00+00:00
Entire data end date: 2015-06-11 00:00:00+00:00
Out-of-Sample Months: 17
Backtest Months: 128
Backtest Out_of_Sample All_History
max_drawdown -0.67 -0.18 -0.67
calmar_ratio 0.26 0.07 0.23
annual_return 0.18 -0.01 0.15
stability 0.57 0.00 0.53
sharpe_ratio 0.54 -0.08 0.50
annual_volatility 0.33 0.16 0.31
alpha 0.05 -0.11 0.02
beta 1.46 0.95 1.44

82% :Similarity between Backtest vs. Out-of-Sample (daily returns distribution)

TypeError Traceback (most recent call last)
in ()
----> 1 analyze_single_algo( df_rets=stock_rets, algo_live_date='2014-1-1', cone_std=1.0 )

in analyze_single_algo(df_rets, algo_live_date, cone_std)
138 plt.title("Daily Returns Similarity = " + str(consistency_pct) + "%" )
139
--> 140 dd_table = gen_drawdown_table(df_rets,top=10)
141
142 dd_table['peak date'] = map( extract_date, dd_table['peak date'])

in gen_drawdown_table(df_rets, top)
48 def gen_drawdown_table(df_rets, top=10):
49 df_cum = cum_returns(df_rets, 1.0)
---> 50 drawdown_periods = get_top_draw_downs(df_rets, top=top)
51 df_drawdowns = pd.DataFrame(index=range(top), columns=['net drawdown in %',
52 'peak date',

in get_top_draw_downs(df_rets, top)
11 #if not np.isnan(recovery):
12 underwater = pd.concat(
---> 13 [underwater.loc[:peak].iloc[:-1], underwater.loc[recovery:].iloc[1:]])
14 #else:
15 # drawdown has not ended yet

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in getitem(self, key) 1178 return self._getitem_tuple(key)
1179 else:
-> 1180 return self._getitem_axis(key, axis=0)
1181
1182 def _getitem_axis(self, key, axis=0):

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in getitem_axis(self, key, axis) 1293 if isinstance(key, slice):
1294 self._has_valid_type(key, axis)
-> 1295 return self.get_slice_axis(key, axis=axis)
1296 elif is_bool_indexer(key):
1297 return self._getbool_axis(key, axis=axis)

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in get_slice_axis(self, slice_obj, axis) 1200 labels = obj._get_axis(axis)
1201 indexer = labels.slice_indexer(slice_obj.start, sliceobj.stop,
-> 1202 slice_obj.step)
1203
1204 if isinstance(indexer, slice):

/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.pyc in slice_indexer(self, start, end, step, kind) 1325
1326 try:
-> 1327 return Index.slice_indexer(self, start, end, step)
1328 except KeyError:
1329 # For historical reasons DatetimeIndex by default supports

/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in slice_indexer(self, start, end, step, kind) 2344 This function assumes that the data is sorted, so use at your own peril
2345 """
-> 2346 start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
2347
2348 # return a slice

/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in slice_locs(self, start, end, step, kind) 2488 start_slice = None
2489 if start is not None:
-> 2490 start_slice = self.get_slice_bound(start, 'left', kind)
2491 if start_slice is None:
2492 start_slice = 0

/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in get_slice_bound(self, label, side, kind) 2426 # For datetime indices label may be a string that has to be converted
2427 # to datetime boundary according to its resolution.
-> 2428 label = self._maybe_cast_slice_bound(label, side, kind)
2429
2430 # we need to look up the label

/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.pyc in maybe_cast_slice_bound(self, label, side, kind) 1280 """
1281 if is_float(label) or isinstance(label, time) or is_integer(label):
-> 1282 self.invalid_indexer('slice',label)
1283
1284 if isinstance(label, compat.string_types):

/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in invalid_indexer(self, form, key) 933 klass=type(self),
934 key=key,
--> 935 kind=type(key)))
936
937 def getduplicates(self):

TypeError: cannot do slice indexing on with these indexers [nan] of <type 'float'

Hi Novice TAI,
Looks like I accidentally left an old function definition for get_top_draw_downs() at the bottom of the notebook when I was testing. Its in the cell directly above the "APPENDIX." The correct version of this get_top_draw_downs() function is at the beginning of the notebook which is why the notebook probably worked up until that point. Can you try deleting that single cell with this function definition at the bottom then "Run All..." the notebook to see if this fixes it?

Disclaimer

Hi Justin,

Yes, as your command, that is workable now, thanks your kindly help.

Hi Justin,

based your point:

The notebook contains all of the functions to compute all performance
statistics, so it is very self contained, and thus also means a
tearsheet can be computed for any timeseries you pass to it. So, you
can upload a CSV of the daily returns of your favorite mutual fund and
see how it looks, or simply pass in the timeseries of a few stocks. An
example of how to accomplish this using a single stock is also
included at the end of this notebook.

I wonder whether could upload a CSV of the Zipline backtest result to the notebook to see how it looks,?

Hi Novice TAI,
It's very easy to see the Quantopian/Zipline backtest performance statistics simply by scrolling up this thread to where I attached the backtest (it's available directly after the original post). Just click on the "Risk Metrics" tab of the backtest and you can view the Zipline results.

Disclaimer

joe lee

Justin - I playes with it and this is just amazing work you did that helps a lot. If you can soon improve it so the analyze_single_algo can also except parameters to feed our backtests this could be a real game changer of our algo optimization.

Hi Justin ,

sorry my poor english.
My mean is whether have the solution to upload the strategy backtest result from https://github.com/quantopian/zipline project's output into "Tearsheet Analysis notebook".

from https://www.quantopian.com/posts/bug-in-consistency-score

Hi Novice TAI,

If you're using a zipline object in the Quantopian Research environment, the TradingAlgorithm object has a value in it called "daily_stats" as well as a "returns" attribute. This "returns" field is just a Pandas Series that you can use to pass to the analyze_single_algo() function that makes the tearsheet in the same way that you ran the example tearsheet previously using the stock EEM. In this case you go to the bottom of the tearsheet notebook that I shared where I show how to pass in the daily returns of a single stock -- I save the daily returns of the stock to the variable 'stock_rets.' If you want to use your Zipline algo's daily returns you could just change that to:

stock_rets = TradingAlgorithm.returns

Disclaimer

Anh Nguyen

hi Justin,

Please see my suggestion on the consistency calculation. I posted in another thread, but perhaps this is a better place. I will continue here:

I understand nothing is perfect and there will be trial and error.
Using daily return distribution is a start. However
- Practically consistency should also be about the magnitude of the total return. The 0.957778084 consistency score for 4.5% vs 100.8% is
a good example of why magnitue matter. Maybe the distribution match,
but if the magnitude are way off, it is not a consistent algo.
- Why daily? Why not hourly, weekly? What if algo only rebalance weekly or monthly? In those cases, daily returns might not reflect the
fundamental characteristics of the algo.

Plus, with limited number of data points for out of sample, which is
the situation that we face now, perhaps a different approach would be
more appropriate than comparing return distribution.

How about rank percentage difference between backtest vs out sample
metrics (return, dd, sharpe, ....) and average out the rankings, just
like the overall algo scoring process? Maybe give different weights
for metrics that are better / worse out of sample? I think this will
work better with smaller set of data.

Jamie Lunn

How do I find my backtest ID?

Michael Van Kleeck

When you're on the backtest results page, the ID is the last part of the URL (everything after the last slash -- 24 hexadecimal digits).

Jamie Lunn

Thank you, MVK!

Madona Syombua

I was just genuinely wondering why did you decide to use Python? I can tell the language is Python if I'm wrong correct me. Or was it just your preferred language? Thanks.

Madona: There are many reasons for why we chose Python. I won't list them all but here are a few key ones:

it is completely open and free (unlike e.g. matlab);
it is a scripting language (unlike C# or Java);
(related to the above point) it is relatively easy to learn (unlike C++);
and, there is a large and healthy ecosystem of support libraries for numerical computing and data analysis.

Disclaimer

Madona Syombua

Thomas: thanks for the answer but what if I would want to do it in C or C++ because I understand the Languages am i limited or should i just learn Python? I want to get into this and i enjoy C++ and C and Java but i don't know Python. I know this might not be the best place to ask this but i will appreciate your feedback. So is Python that fast? .. Thanks Thomas.

I don't think you will regret learning Python, and it's very easy to get going. There's many good tutorials out there. Here's one http://learnpythonthehardway.org/

Disclaimer

Charlie Hacker

Wow, I'm really glad I tried this took less than 10min start to finish which was great. I really wish there was a better way to track dividend distributions. I've got one distribution in 2012 that is especially difficult to understand.

I'm wondering about the compounded annual returns, shouldn't

return pow((1 + ts.mean()), 252) - 1

actually be

return pow((ts.iloc[-1]/ts.iloc[0]), (252 / len(ts))) - 1

the variable 'ts' contains daily percent returns, not the portfolio values. This is why the compounding formula implemented is: return pow((1 + ts.mean()), 252) - 1

Disclaimer

Okay I see, I just don't understand why I get different results between the two methods. Probably your calculation is right but this is driving me nuts and I'd be happy to understand what's going on :/

symbol_list = ['SPY']

securities_panel = get_pricing(symbol_list, fields=['price']  
                               , start_date='2000-01-01', end_date='2015-06-11')  
securities_panel.minor_axis = map(lambda x: x.symbol, securities_panel.minor_axis)

stock = securities_panel.loc['price']['SPY'].dropna()  
stock_rets = stock.pct_change().dropna()

v1 = (pow((1 + stock_rets.mean()), 252) - 1)

v2 = pow((stock.iloc[-1]/stock.iloc[0]), (252 / len(stock))) - 1

assert v1 == v2, '{} != {}'.format(v1,v2)

I get AssertionError: 0.0664323133186 != 0.0460561004688 which is a big difference

Marco,

The difference seems to be the frequency over which the compounding is occurring. v1 is compounding returns daily (e.g. 252) whereas v2 compounds what seems to be annually. e.g. if len(stock) = 252, then the base of the expression (ie: the % change between day 1 and day 252) is only raised to the power of 1.

Both approaches are sound mathematically, just 1 incorporates daily compounded growth, which can add up significantly over time when an asset is increasing over time.

Disclaimer

Hi Justin,

thanks a lot!

Okay but if v2 calculates the compounded returns, shouldn't:

price = stock[0]

for i in list(stock_rets):  
    price *= (1+(v1/252))  
print(price)

return 211.59 instead of 281.96?

Unfortunately I think this is an example of where there are many different mathematical approaches to accomplish a financial calculation. You can calculate returns either arithmetically, geometrically (compounded), or in log-returns space. If taking the geometric approach then you have to choose a compounding frequency, or continuous frequency (e.g. future value = present value * e^rt). Each of the approaches will yield a different result when computing something like annual return. The important thing is to just choose one approach and use it consistently if your intention is to compare 2 different trading strategies, 2 stocks, etc. I've chosen 1 method. If you feel more comfortable with a different approach please feel free to swap it in as the body of the function in the tearsheet.

Disclaimer

Ok, thanks Justin, that helps!

one little improvement I'd like to suggest:

If you use fmt=".1f" in the heat map settings you get one decimal shown in the monthly results heat map. Otherwise it gets rounded to full integers (sometimes) which can be misleading.

The complete line would be:

sns.heatmap(monthly_ret_table.fillna(0), annot=True, fmt=".1f",annot_kws={"size": 12}, alpha=1.0, center=0.0, cbar=False, cmap='RdYlGn')

http://ycharts.com/glossary/terms/annualized_returns

Thanks Marco for that formatting tip in sns.heatmap(). Seems much cleaner for sure. We definitely plan on continuing to release new/updated/increased-functionality tearsheet notebooks in the future, so if you continue to have ideas/suggestions feel free to keep posting them in this thread.

Disclaimer

Vladimir

Justin,

Performence metric anualized return should not be calculated in different ways depending on mathematical approaches of compounding.

Yahoo Finance defines

Annualized Return=(Period Ending Price/Period Beginning Price)^(1/t) - 1

In financial industry it is known as Compound annual return.

Compound annual return = (Ending Value / Beginning Value)^((1 / n) - 1)

http://financial-dictionary.thefreedictionary.com/CAGR

This is the only way we should calculate Annualized Returns.

Vladimir: That's what I thought too and it's the formula I'm using now. This way it's comparable to results outside of Quantopian, and not only this but also the other metrics that depend on it like Sharpe-Ratio.

Karen Rubin

The work that Justin original shared here has been launched as an open source project - Pyfolio. It has also been incorporated into the research environment for easy backtest analysis.

Take a look at the attached notebook to get the details.

Disclaimer

Jamie Lunn

Hello,

Has anyone uploaded a csv to analyze mutual fund returns? I'm trying to figure out how to do this--via local_csv, I assume.

Thanks,
Jamie

John Jay Buchtel

For Marco and Vladimir:

@Marco: Thank you for your work on the CAGR and the Heat Maps in Notebook. Solid.

I notice that they still have not incorporated CAGR or decimals in Heat Maps in the Notebook Stats. If you have a version that does this would you mind sharing it? This generosity would be greatly appreciated...!

Also, I notice the Sharpe Ratio they produce in Notebooks has a quite large variance to that in the Backtest Result Summary. Do you know why this is the case? Please advise.

@Vladimir: You are correct to point out that all investment professionals use CAGR in their overview analysis of competing investments (a convention whose utility is perhaps simply its ubiquity) and the use of this stat in the Notebook and the Backtest Result Summary should not be considered optional but an important feature.

Also, we should lobby for inclusion of Bayesian T-Sharpes as well...! (video: http://blog.quantopian.com/probabilistic-programming-for-non-statisticians/)

Cheers gentlemen!

Hi John,

I'm using my own backtesting environment that I've built for myself since I'm mostly trading futures. So I've just taken some parts from the pyfolio that I need, therefor can't give you a complete code but you should be able to figure it out on your own, it's fairly easy. If not let me know.

for the heat map, it's fairly simple, just add
fmt=".1f"
to the sns.heatmap() Parameters, my version looks like this:

sns.heatmap(monthly.fillna(0), annot=True, fmt=".1f",annot_kws={"size": 12}, alpha=1.0, center=0.0, cbar=False, cmap='RdYlGn')

Regarding CAGR for me this works (and at the end you just need to stick with something to be able to compare your models). I measure everything in regards to volatility instead of using USD only anyway, but here's the USD version:

# Calc annualized % results (CAGR)  
annualized_result_pct_usd = ((((end_net_balance_usd/start_net_balance_usd) ** (1 / years)) - 1) * 100)

To figure out the years I use

years = (equity.index[-1] - equity.index[0]).total_seconds() / 31557600

So you will need to figure out how these variable are called in the world of PyFolio. My guess is that there's an equity-series somewhere and you'd simply take the first value of that (unless the first one already is involved in a trade potentially, be careful here) and the last one. I personally have two of these, one uses the gross balance only considering closed trades, without margin/open positions, and a net balance that includes everything.

Don't know about sharpe, I'm using the "simple" version of it:
```

annual_volatility_pct_usd =net_balance_usd.pct_change().dropna().std() * np.sqrt(252) * 100
sharpe_pct_usd = annualized_result_pct_usd / annual_volatility_pct_usd

which does the job for me. 

I also have a RAR% version of it, google around and you'll find out what it is.

Hope this helps you somehow!

Marco

@John Jay,
I know we exchanged emails a bit earlier today, but now just noticing you posted your questions here in this thread as well. I just wanted to make sure you are using the most up to date version of pyfolio that is now available fully integrated in our research environment without having to re-use the notebook I shared as the original message in this thread.

To this reply I have attached the simple version of generating a pyfolio tearsheet (always using the most recent stable of pyfolio) with just 2 lines of code in the notebook. Hope this helps! I think it addresses some of your questions, as well as some of the color/formatting items you mentioned (e.g. heatmap decimal places, etc).

Thanks,
Justin

Disclaimer

PaulB

Justin,
this is an awesome tool - even better now.

I get several errors running this now - didn't used to - , just tried the latest version above but still get them.
Paul

/usr/local/lib/python2.7/dist-packages/pyfolio/utils.py:157: UserWarning: Could not update cache /usr/local/lib/python2.7/dist-packages/pyfolio/data/factors.csv.Exception: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/pyfolio/data/factors.csv' UserWarning)
/usr/local/lib/python2.7/dist-packages/matplotlib/cbook.py:133: MatplotlibDeprecationWarning: The "loc" positional argument to legend is deprecated. Please use the "loc" keyword instead. warnings.warn(message, mplDeprecation, stacklevel=1)
/usr/local/lib/python2.7/dist-packages/pyfolio/tears.py:459: UserWarning: Unable to generate turnover plot. warnings.warn('Unable to generate turnover plot.', UserWarning)

Vladimir

Justin ,

Is there any updates on CAGR as main feature in pyfolio and backtester?

PaulB: Those are harmless warnings.

Disclaimer

@Vladimir, yes we'll move to making CAGR the default. It's already in pyfolio master as an option but I agree that we should move this to default behaviour that is propagated to all the plots.

Disclaimer

Is there any updates on CAGR as main feature in pyfolio and backtester?

It was added here: https://github.com/quantopian/empyrical/pull/19 and will be on Quantopian when we next ship updates.

Disclaimer

When will be the next ship date?

Josh Payne

I think it will be wrapped up in the upgrade to pandas 18: https://www.quantopian.com/posts/soon-upgrade-to-pandas-0-dot-18

Disclaimer

Volodymyr Vovchak

I can't get tersheet for my algo

bt = get_backtest('58526ebeb57b354818da6b17')

# Create all tear sheets  
bt.create_full_tear_sheet()  
11% ETA:  0:15:34|######                                                     |

It loads only part of the back-test and hangs. Waited for hours and nothing. May be someone knows some workaround. Back-test is from 1.1.2010 till 30.11.2016, it is long but still it should load, right?