Notebook

Quantpedia Trading Strategy Series: An Analysis on Cross-Sectional Mean Reversion Strategies

by Matthew Lee

Introduction

Background

The existence of short term return reversals for equities has captured the attention of many quantitative researchers.

In 1990, Bruce Lehmann found that over the period of 1962 - 1986 stocks in the highest returns of the prior week typically had negative returns in the following week. In his study, he found that contrarian strategies (picking past losers and winners) generated abnormal returns of over 2% each month. In the same year, Jegadeesh found that short-term reversals exist over the 1 month horizon. These 1 month short-term reversals are why many academic researchers generally use a 2-12 momentum measurement (returns over the past 12 months, excluding the previous one) when examining momentum.

Researchers have put forth a number of theories to try and explain these short-term reversals. Lehmann attributed the phenomemon to cognitive bias leading to market inefficiency while another series of studies cited market-microstructure frictions (bid-ask bounce) as the cause.

Our Study

This notebook serves to analyze the findings on cross-sectional mean reversion strategies covered in various papers during an out of sample period from 12-01-2011 to 12-01-2016. The study is done in two parts.

This part specifically covers a review of the general contrarian strategies highlighted by Lehmann and Jegadeesh. The second part will cover more advanced and recently discovered contrarian strategies given to us by Quantpedia.

Our universe is defined as stocks in the Q1500 - I use the Q1500 as a proxy due to the liquidity and high market cap of most stocks in the universe.

Results Overview

Overall, I found that short term reversal strategy utilizing lookbacks of one week and one month has significantly below market performance over the period from 2011 - 2016.

In my notebook, I find that utilizing a decile grouping based on a returns lookback of 13 days is correlated with 13 day average returns, with the lowest/highest decile performing slightly over/under our SPY and Q1500 market benchmark with a 1.6% average spread per quarter.

The actual implementation of short term reversal strategies suffers due to the high trading activity required to rebalance the portfolio. Due to this, the findings in many research papers fail to reflect the true profitability of short-term reversal strategies.

Table of Contents

Our notebook is structured in the following way. I recommend following the notebook sequentially to fully understand the process and results which follow.

  1. Data Generation
  2. Short Term Reversal Results
  3. Considerations
  4. Conclusion

</a>

Data Generation

We begin by collecting data about stocks in the Q1500 from 1-01-2011 - 12 - 01 - 2016 . To do this, we utilize Quantopian's pipeline.

For this study we are interested in keeping track of certain variables pertaining to prior and future returns performance for each stock. For each stock, we keep track of the following variables, where X represents the returns lookback window:

- lb_X : The quantile this stock falls into, based on returns performance for the past X days (ex. lb_3 = 0 means that this stock was in the lowest performing decile of stocks for the past 3 days)
- lb_X_s : lb_X, but only considering stocks which fall in this stock's sector
- lb_X_v : lb_X, but only stocks which are in the lowest volatility group
- lag_lb_X : lb_X, skipping the past day's worth of data
- lag_lb_X_s: lb_X_s, skipping the past day's worth of data
- lag_lb_X_v: lb_X_v, skipping the past day's worth of data
- T Y : The returns of this stock Y days from today, defined as the (price at today + y) / (price today)

In short, we are looking to construct portfolios based on our various lookback windows, and gauge the performance of our portfolio by looking at the forward returns tracked in T.

I've built this study so that it can be easily modified by changing parameters. Below I define the global variables which define which returns lookback windows we consider, how far we generate the forward returns, how many quantiles we separate the stocks into based on returns, and the start and end time of our study.

Note that part 1 will only cover a simple contrarian strategy - hence we are only considering the lb_x and lag_lb_x lookback windows in this notebook.

In order to keep things uniform, I'll utilize the same pipeline between parts 1 and 2 of this study.

In [3]:
LOOKBACK_WINDOWS = [5, 10, 13, 20]
HOLD_TIMES = ["T {}".format(x) for x in range(0, 31)]
RETURNS_QUANTILES = 10
START = pd.Timestamp("1-01-2011")
END = pd.Timestamp("12-01-2016")

Below is the pipeline which generates our data for us.

In [4]:
def make_pipeline():
    # Filters
    q500 = Q500US() 
    q1500 = Q1500US()
    mask = q500 | q1500
    
    ### Factors ###
    # General Factors
    columns = {}
    sector = Sector(mask=mask)
    adv = AverageDollarVolume(mask=mask, window_length=30)
    columns['Sector'] = sector
    columns['ADV'] = adv
    columns['Q500'] = q500
    columns['Q1500'] = q1500
    
    # Volatility Factor
    vol_20 = Volatility(window_length=20, mask=mask)
    
    # Returns Lookback Factors
    window_lengths = LOOKBACK_WINDOWS
    reg_returns = {x : Returns(window_length=x) for x in window_lengths}
    delayed_returns = {x : Returns(window_length=x+5) for x in window_lengths}
    sector_returns = {x : reg_returns[x].demean(groupby=sector) for x in window_lengths}
    delayed_sector_returns = {x: delayed_returns[x].demean(groupby=sector) for x in window_lengths}
    
    ### Classifiers ###
    # Voltatility
    vol_20_q = vol_20.quantiles(5, mask=mask)
    lowest_vol = vol_20_q.eq(0)
    
    # Deciles for each type of returns lookback
    for x in window_lengths:
        columns['lb_{}'.format(x)] = reg_returns[x].quantiles(RETURNS_QUANTILES, mask=mask) 
        columns['lb_{}_s'.format(x)] = sector_returns[x].quantiles(RETURNS_QUANTILES, mask=mask)
        columns['lb_{}_v'.format(x)] = reg_returns[x].quantiles(RETURNS_QUANTILES, mask=mask&lowest_vol)
        columns['lag_lb_{}'.format(x)] = delayed_returns[x].quantiles(RETURNS_QUANTILES, mask=mask)
        columns['lag_lb_{}_s'.format(x)] = delayed_sector_returns[x].quantiles(RETURNS_QUANTILES, mask=mask)
        columns['lag_lb_{}_v'.format(x)] = delayed_returns[x].quantiles(RETURNS_QUANTILES, mask=mask&lowest_vol)
        
    pipe = Pipeline(
        screen=mask,
        columns=columns
    )
    return pipe

Now I can generate the data using this pipeline. For the sake of brevity, I've combined the entire process of generating the pipeline data and calculating the forward returns in a single function called create_data. If you are interested in the specifics of the data creation, check out the function at the bottom of the notebook. Now - let's create the data and take a peek at it.

In [7]:
data = create_data()
print data.shape
data.head(5)
Generated all pipeline data
Generated all pricing data
(105748, 61)
Out[7]:
Day Sid ADV Sector lag_lb_10 lag_lb_10_s lag_lb_10_v lag_lb_13 lag_lb_13_s lag_lb_13_v ... T 23 T 24 T 25 T 26 T 27 T 28 T 29 T 30 Quarter Month
0 2010-12-31 00:00:00+00:00 Equity(2 [ARNC]) 2.602924e+08 101 9 8 -1 8 7 -1 ... 0.130263 0.140789 0.140132 0.145395 0.132237 0.118421 0.126974 0.144408 Q42010 12-2010
1 2010-12-31 00:00:00+00:00 Equity(24 [AAPL]) 3.699962e+09 311 3 4 2 4 6 2 ... 0.064547 0.064048 0.077400 0.095151 0.099811 0.106639 0.098468 0.104775 Q42010 12-2010
2 2010-12-31 00:00:00+00:00 Equity(62 [ABT]) 3.420363e+08 206 1 0 -1 3 1 -1 ... -0.035420 -0.029825 -0.027006 -0.038989 -0.041764 -0.041940 -0.043835 -0.036257 Q42010 12-2010
3 2010-12-31 00:00:00+00:00 Equity(67 [ADSK]) 8.882755e+07 311 3 4 -1 2 4 -1 ... 0.115216 0.131491 0.126841 0.121416 0.102557 0.084991 0.101007 0.099199 Q42010 12-2010
4 2010-12-31 00:00:00+00:00 Equity(76 [TAP]) 3.794620e+07 205 2 2 -1 5 4 -1 ... -0.055600 -0.043288 -0.054011 -0.045075 -0.043090 -0.064138 -0.102264 -0.104051 Q42010 12-2010

5 rows × 61 columns

To make relevant comparisons, I generate the benchmark SPY returns.

In [9]:
benchmark_returns = create_benchmark_returns()

As you can see, once a month we generate the variables for all the stocks in the Q1500.

</a>

Short Term Reversals with different lookback and hold windows

First, let's look at mean reversions in general. We are interested in looking at the average cumulative returns for mean reversions with different lookback windows. In other words, over which lookback window will returns most likely reverse, and for how long will the reversal trend continue?

Thus, there are three variables to consider:

1. The length of the returns lookback window
2. The quantile which the stock's prior returns fall in 
3. How long we hold the stock for

Each month in my pipeline, I track which quantile each stock falls for different returns lookbacks. I also have added the returns up to T + 30 for each stock. So, to get a general overview of mean reversion trends, we can generate a heatmap chart where the X axis represents the returns lookback window, and the Y axis represents the average returns T days after the purchase date.

I define average returns as (price @ T X - price @ T 0) / (price @ T 0), averaged across all monthly periods in the sample, and across all stocks in a certain decile.

Note that we are interested in returns in both directions. So we need to construct 2 heatmaps - one where the X axis represents the lowest decile of returns windows, and the other where the X axis represents the highest decile of returns windows.

If our cross sectional mean reversion generally holds, we would expect to see positive average returns for the lowest decile of returns, and negative average returns for the highest decile of returns.

Below I plot the two heatmaps for lowest decile, and highest decile of prior earnings, respectively.

In [10]:
ax = sns.heatmap(create_spread_matrix(data, 0), annot=True)
ax.set(title= "Lowest Decile Average Returns by Lookback Window and Hold Time",
       xlabel = 'Returns Window Lookback',
       ylabel = 'Hold Time')
ax.plot()
Out[10]:
[]
In [11]:
ax = sns.heatmap(create_spread_matrix(data, 9), annot=True)
ax.set(title= "Highest Decile Average Returns by Lookback Window and Hold Time",
       xlabel = 'Returns Window Lookback',
       ylabel = 'Hold Time')
ax.plot()
Out[11]:
[]

What do these heatmaps mean? On the far right column, we see the average T 0 - T 30 performance for the SPY ETF. Each cell represents the average cumulative returns for stocks with a lookback window x and hold time y.

Overall, we notice a pattern when considering returns 13 days from the purchase date for both the lowest decile and the highest decile, for lookback windows of 13 and 20 days.

For these cells, we notice that the lowest decile has noticeable higher than benchmark returns, while the highest decile has significantly lower returns. This is indicative of a correlation between the lowest decile and higher returns, and the highest decile and lower returns.

We can get a more detailed look at returns performance, across all deciles of prior windows. The graphs below represent the average cumulative returns from T 0 to T 30, where each different line represents a different decile. The average cumulative returns for the benchmark SPY etf and Q1500 are marked in black and magenta, respectively.

In [12]:
plot_cum_rets(data, 'lb_5')
plot_cum_rets(data, 'lb_10')
plot_cum_rets(data, 'lb_13')
plot_cum_rets(data, 'lb_20')

Overall, we see that each curve's behavior models the benchmark SPY/Q1500. At the point where T = 13 on the 13 day lookback graph, we can see that the decile and deciles' performance relative to the benchmark are negatively related. the gap between the dark blue and light blue line represents the potential returns our portfolio could make, when selling on that day.

Diving deeper into this correlation, let's look at the performance of each decile at the day T 13, with a 13 day lookback window.

In [13]:
dec_plot(data, "lb_13", "T 13")

We can see that 2 week performance is correlated with the decile of 2 week prior returns each stock falls into.

From preliminary results, it appears that the reversals found by Jegadesh and Lehmann no longer exist in lookback/hold windows of a week and month, but rather on a bimonthly basis.

To illustrate the performance of deciles for their prior studies, we can look at the same graph with the respective lookback and hold windows used by Jegadesh and Lehmann.

In [14]:
dec_plot(data, "lb_5", "T 5")
dec_plot(data, "lb_20", "T 20")

Interestingly enough, when considering the average performance using a monthly lookback and hold window, we see that the highest performers consistenly fall in deciles at either extreme. Perhaps this is due to the fact that simple contrarian strategies utilizing these parameters are widely known, and have been overutilized the point where they net significantly below market returns.

Let's return to our investigation of the performance of our portfolio picking stocks based on a 13 day lookback.

We can see our quarterly performance/yearly by plotting the average spread of (lowest decile stocks - highest decile stocks) for all the quarters/years in our sample.

In [20]:
plot_spread_quarter(data, "lb_13", "T 13")
plot_spread_year(data, "lb_13", "T 13")
Amount of quarters negative = 11
Amount of quarters positive = 14
Average Spread = 0.160645266034

Throughout our sample, stocks picked based on our simple contrarian strategy on perform positively 14 out of the 25 quarters, with an average spread of .16.

</a>

Some Considerations

Several researchers have pointed out that short term reversals are heavily affected by market microstructure biases like the bid ask bounce.

In order to explore short term reversals without these biases, we can ignore the prior weeks worth of returns when constructing our portfolio. In other words, we are using our lag_lb_x variable which groups stocks into deciles based on returns from T - x - 1 to T - 1, where T is the purchase date of the stock.

We can run all of our previous graphs wiht the new variable, and see whether it significantly impacts our strategy.

In [16]:
dec_plot(data, "lag_lb_13", "T 13")
plot_spread_quarter(data, "lag_lb_13", "T 13")
plot_spread_year(data, "lag_lb_13", "T 13")
Amount of quarters negative = 9
Amount of quarters positive = 16
Average Spread = 0.130458803733

We see that accounting for microstructure biases impacts our strategy negatively. Overall, however, our trend seems to hold.

</a>

Conclusion

Simple contrarian strategies using conventional returns lookback and hold windows seem to have decayed in effectiveness over the past 5 years. I find that utilizing a new returns lookback and hold window yields a substantially more promising strategy.

This notebook serves as a quick glance at performance for simple contrarian strategies in recent years. I encourage deeper exploration into investigating "why" the phenomenon I've noted in this notebook occur.

Using the data generated in this notebook at a starting point, I look forward to further investigations into specific mean reversion trends during sub periods, while incorporating seasonality and other factors into mix.

</a>

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
import seaborn as sns

from datetime import timedelta

from quantopian.pipeline import CustomFactor, Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume, Returns
from quantopian.pipeline.classifiers.morningstar import Sector
from quantopian.pipeline.filters.morningstar import Q500US, Q1500US
from zipline.utils.calendars import get_calendar

def run_pipeline_freq(pipeline, start, end):
    dates = pd.date_range(start, end, freq='BM')
    dates = dates.tolist()
    if start not in dates:
        start = start - pd.tseries.offsets.MonthOffset()
        dates.insert(0, start)
        
    output = None
    return pd.concat(map(
        lambda i: run_pipeline(pipeline, dates[i], dates[i]),
        range(len(dates))
    ))

class Volatility(CustomFactor):
    inputs = [USEquityPricing.close]
    def compute(self, today, assets, out, close):
        out[:] = -np.nanstd(close[:-1], axis=0)

class DelayedReturns(CustomFactor):
    inputs = [USEquityPricing.close]
    def compute(self, today, assets, out, close):
        out[:] = close[-2] / close[0]    
        
def make_pipeline():
    # Filters
    q500 = Q500US() 
    q1500 = Q1500US()
    mask = q500 | q1500
    
    ### Factors ###
    # General Factors
    columns = {}
    sector = Sector(mask=mask)
    adv = AverageDollarVolume(mask=mask, window_length=30)
    columns['Sector'] = sector
    columns['ADV'] = adv
    
    # Volatility Factor
    vol_20 = Volatility(window_length=20, mask=mask)
    
    # Returns Lookback Factors
    window_lengths = LOOKBACK_WINDOWS
    reg_returns = {x : Returns(window_length=x) for x in window_lengths}
    delayed_returns = {x : Returns(window_length=x+1) for x in window_lengths}
    sector_returns = {x : reg_returns[x].demean(groupby=sector, mask=mask) for x in window_lengths}
    delayed_sector_returns = {x: delayed_returns[x].demean(groupby=sector, mask=mask) for x in window_lengths}
    
    ### Classifiers ###
    # Voltatility
    vol_20_q = vol_20.quantiles(5, mask=mask)
    lowest_vol = vol_20_q.eq(0)
    
    # Deciles for each type of returns lookback
    for x in window_lengths:
        columns['lb_{}'.format(x)] = reg_returns[x].quantiles(RETURNS_QUANTILES, mask=mask) 
        columns['lb_{}_s'.format(x)] = sector_returns[x].quantiles(RETURNS_QUANTILES, mask=mask)
        columns['lb_{}_v'.format(x)] = reg_returns[x].quantiles(RETURNS_QUANTILES, mask=mask&lowest_vol)
        columns['lag_lb_{}'.format(x)] = delayed_returns[x].quantiles(RETURNS_QUANTILES, mask=mask)
        columns['lag_lb_{}_s'.format(x)] = delayed_sector_returns[x].quantiles(RETURNS_QUANTILES, mask=mask)
        columns['lag_lb_{}_v'.format(x)] = delayed_returns[x].quantiles(RETURNS_QUANTILES, mask=mask&lowest_vol)
        
    pipe = Pipeline(
        screen=mask,
        columns=columns
    )
    return pipe

def get_quarter(row):
    date = pd.Timestamp(row["Day"])
    return "Q{}{}".format(str(date.quarter), str(date.year))

def get_month(row):
    date = pd.Timestamp(row["Day"])
    return "{}-{}".format(str(date.month), str(date.year))

def create_data():
    stock_data = run_pipeline_freq(make_pipeline(), START, END)
    print "Generated all pipeline data"
    stock_data.reset_index(inplace=True)
    stock_data.rename(columns={"level_1": "Sid", "level_0": "Day"}, inplace=True)
    price_data = get_pricing(stock_data['Sid'].unique(), START - pd.tseries.offsets.MonthBegin(), END + pd.Timedelta('70d'))
    print 'Generated all pricing data'
    price_data = price_data['open_price']
    stock_data = compute_returns(stock_data, price_data)
    stock_data['Quarter'] = stock_data.apply(lambda row: get_quarter(row), axis = 1)
    stock_data['Month'] = stock_data.apply(lambda row: get_month(row), axis = 1)
    return stock_data

                                   
def get_returns_window(price_data, sid, date, days_before, days_after):
    """
    Calculates cumulative returns for a stock for a given window 
    
    Parameters
    ----------
    price_data : pd.DataFrame
        Pricing history DataFrame obtained from `get_pricing`. Index should
        be the datetime index and sids should be columns.
    sid : int or zipline.assets._assets.Equity object
        Security that returns are being calculated for.
    day : datetime object
        Date that will be used as t=0 for cumulative return calcuations. All
        returns will be calculated around this date.
    days_before, days_after : int
        Days before/after to be used to calculate returns for.
    
    Returns
    -------
    sid_returns : pd.Series
        Cumulative returns time series from days_before ~ days_after from date
    """
    date = pd.Timestamp(date)
    try:
        date_index = price_data.index.get_loc(date)
        base_price = price_data.iloc[date_index][sid]
    except:
        return None
    
    end_index = date_index + days_after + 1
    start_index = date_index - days_before

    if end_index >= len(price_data.index) or start_index <0:
        return None
    
    prices = price_data.iloc[start_index:end_index,:].loc[:,[sid]]
    cumulative_returns = (prices[sid] - base_price) / base_price 
    cumulative_returns.index = range(-days_before, days_after + 1)
    return cumulative_returns

def compute_returns(stock_data, price_data):
    # Compute cumulative returns window around day 
    for idx, row in stock_data.iterrows():
        t = row["Day"]
        sid = row["Sid"]
        returns_window = get_returns_window(price_data, sid, t, 0, 30)
        if returns_window is not None:
            for index, ret in returns_window.iteritems():
                stock_data.set_value(idx, ('T {}').format(index), ret)

    stock_data.dropna(inplace=True)
    return stock_data

def plot_cum_rets(stock_data, decile_col):
    y_values = []
    x_axis = range(len(HOLD_TIMES))
    for name, group in stock_data.groupby(decile_col):
        if name == -1:
            continue
        y_values.append([group[i].mean() for i in HOLD_TIMES])
        
    for idx, y_axis in enumerate(y_values):
        if idx % 2 != 0 and idx != 9:
            continue
        plt.plot(x_axis, y_axis, label="Decile {}".format(idx))
        
    plt.plot(x_axis, [benchmark_returns[i].mean() for i in HOLD_TIMES], label="SPY Benchmark".format(idx), color ="black")
    plt.plot(x_axis, [stock_data[i].mean() for i in HOLD_TIMES], label="Q1500".format(idx), color ="magenta")

    plt.xticks(range(len(x_axis)), HOLD_TIMES)
    plt.xlabel("Days after T")
    plt.ylabel("Avg Cumulative Returns")
    plt.title("Avg Cumulative Returns by {} Decile, Days after T".format(decile_col))
    plt.legend(loc="best")
    plt.show()
    
def create_benchmark_returns():
    benchmark_pricing = get_pricing(symbols(['SPY', 'AAPL']),START - pd.tseries.offsets.MonthBegin(), END + pd.Timedelta('70d'))["open_price"]
    benchmark_returns = pd.DataFrame(columns=["Day"] + ["T {}".format(x) for x in range(31)])
    for i, date in enumerate(data["Day"].unique()):
        benchmark_returns.set_value(i, "Day", date)
        rets = get_returns_window(benchmark_pricing, symbols("SPY"), date, 0, 30)
        for idx, ret in rets.iteritems():
            benchmark_returns.set_value(i, ('T {}').format(idx), ret)
    return benchmark_returns

def plot_spread_quarter(stock_data, decile_col, returns_col):
    quarters = stock_data['Quarter'].unique()
    lowest_returns = [stock_data[(stock_data[decile_col] == 0) & (stock_data['Quarter'] == x)][returns_col].mean()
                        for x in quarters]
    highest_returns = [stock_data[(stock_data[decile_col] == 9) & (stock_data['Quarter'] == x)][returns_col].mean()
                        for x in quarters]
    
    spread =[low - high for low, high in zip(lowest_returns, highest_returns)]
    
    num_negative = sum([1 for i in spread if i < 0])
    print "Amount of quarters negative = {}".format(num_negative)
    print "Amount of quarters positive = {}".format(len(spread) - num_negative)
    print "Average Spread = {}".format(sum(spread))
    plt.bar(range(len(quarters)), spread, align='center')
    plt.xticks(range(len(quarters)), quarters, rotation='vertical')
    plt.xlabel('Quarters')
    plt.ylabel('Spread between lowest and highest {} deciles'.format(decile_col))
    plt.title('Spread between lowest and highest {} deciles per Quarter'.format(decile_col))
    plt.show()

def create_spread_matrix(returns_df=None, decile=None):
    lookback_range = LOOKBACK_WINDOWS
    hold_range = HOLD_TIMES
    spread_matrix = pd.DataFrame(columns=lookback_range, index=hold_range, dtype=float)
    for days_to_hold in hold_range:
        spread_matrix.loc[days_to_hold, "SPY Benchmark"] = benchmark_returns[days_to_hold].mean()
        for lookback in lookback_range:
            spread_matrix.loc[days_to_hold, lookback] = returns_df[returns_df["lb_{}".format(lookback)]==decile][days_to_hold].mean()
    return spread_matrix


def plot_spread_year(stock_data, decile_col, returns_col):
    years=[]
    lowest_returns = []
    highest_returns = []
    
    for name, group in stock_data.groupby(stock_data['Day'].map(lambda x: x.year)):
        years.append(name)
        lowest_returns.append(group[group[decile_col] == 0][returns_col].mean())
        highest_returns.append(group[group[decile_col] == 9][returns_col].mean())
    
    spread =[low - high for low, high in zip(lowest_returns, highest_returns)]

    plt.bar(range(len(years)), spread, align='center')
    plt.xticks(range(len(years)), years, rotation='vertical')
    plt.xlabel('Years')
    plt.ylabel('Spread between lowest and highest {} decile'.format(decile_col))
    plt.title('Spread between lowest and highest {} decile per Year'.format(decile_col))
    plt.show()
    
    
def dec_plot(stock_data, decile_col, returns_col):
    decile_avg_returns = {}
    for dec in range(0,10):
        decile_avg_returns[dec] = stock_data[stock_data[decile_col] == dec][returns_col].mean()

    plt.bar(range(len(decile_avg_returns)), decile_avg_returns.values(), align='center')
    plt.xticks(range(len(decile_avg_returns) +1), decile_avg_returns.keys())
    plt.xlabel("{} decile".format(decile_col))
    plt.ylabel(('Average Returns after {}').format(returns_col))
    plt.title('Average Returns by {} Decile'.format(decile_col))
    plt.show()
In [ ]: