Notebook

Companion notebook for Alphalens tutorial lesson 2

Creating Tear Sheets With Alphalens

In the previous lesson, you learned what Alphalens is. In this lesson, you will learn a four step process for how to use it:

  1. Express an alpha factor and define a trading universe by creating and running a Pipeline over a certain time period.
  2. Query pricing data for the assets in our universe during that same time period with get_pricing().
  3. Align the alpha factor data with the pricing data with get_clean_factor_and_forward_returns().
  4. Visualize how well our alpha factor predicts future price movements with create_full_tear_sheet().

Build And Run A Pipeline

Execute the following code to express an alpha factor based on asset growth, then run it with run_pipeline()

In [1]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import factset
from quantopian.research import run_pipeline
from quantopian.pipeline.filters import QTradableStocksUS

def make_pipeline():
    
    # Measures a company's asset growth rate.
    asset_growth = factset.Fundamentals.assets_gr_qf.latest 
    
    return Pipeline(
        columns = {'Asset Growth': asset_growth},
        screen = QTradableStocksUS() & asset_growth.notnull()
    )

pipeline_output = run_pipeline(pipeline=make_pipeline(), start_date='2014-1-1', end_date='2016-1-1')

# Show the first 5 rows of factor_data
pipeline_output.head(5) 

Pipeline Execution Time: 1.92 Seconds
Out[1]:
Asset Growth
2014-01-02 00:00:00+00:00 Equity(2 [HWM]) -4.881690
Equity(24 [AAPL]) 17.570883
Equity(31 [ABAX]) -0.222350
Equity(39 [DDC]) 33.137298
Equity(41 [ARCB]) -3.880170

Query Pricing Data

Now that we have factor data, let's get pricing data for the same time period. get_pricing() returns pricing data for a list of assets over a specified time period. It requires four arguments:

  • A list of assets for which we want pricing.
  • A start date
  • An end date
  • Whether to use open, high, low or close pricing.

Execute the following cell to get pricing data.

In [2]:
pricing_data = get_pricing(
    symbols=pipeline_output.index.levels[1], # Finds all assets that appear at least once in "factor_data"  
    start_date='2014-1-1',
    end_date='2016-2-1', # must be after run_pipeline()'s end date. Explained more in lesson 4
    fields='open_price' # Generally, you should use open pricing.
)

# Show the first 5 rows of pricing_data
pricing_data.head(5)
Out[2]:
Equity(2 [HWM]) Equity(21 [AAME]) Equity(24 [AAPL]) Equity(25 [HWM_PR]) Equity(31 [ABAX]) Equity(39 [DDC]) Equity(41 [ARCB]) Equity(52 [ABM]) Equity(53 [ABMD]) Equity(62 [ABT]) ... Equity(49682 [DYLS]) Equity(49683 [IMOM]) Equity(49684 [MCX]) Equity(49685 [NOK_WI]) Equity(49686 [RIV]) Equity(49687 [RNVA_W]) Equity(49688 [UDBI]) Equity(49689 [LVHD]) Equity(49690 [EDBI]) Equity(49691 [DDBI])
2014-01-02 00:00:00+00:00 10.334 4.027 76.446 75.073 39.199 13.945 33.178 27.221 26.66 36.253 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2014-01-03 00:00:00+00:00 10.344 3.968 76.058 74.173 39.455 13.925 33.475 27.011 26.84 36.520 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2014-01-06 00:00:00+00:00 10.432 3.987 73.938 NaN 39.613 13.704 33.475 27.298 27.23 37.300 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2014-01-07 00:00:00+00:00 10.353 3.948 74.883 NaN 40.687 13.483 33.257 27.173 27.36 37.338 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2014-01-08 00:00:00+00:00 10.304 3.987 74.125 72.279 41.791 13.406 33.494 27.183 27.54 36.977 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 10341 columns

Align Data

get_clean_factor_and_forward_returns() aligns the factor data created by run_pipeline() with the pricing data created by get_pricing(), and returns an object suitable for analysis with Alphalens' charting functions. It requires two arguments:

  • The factor data we created with run_pipeline().
  • The pricing data we created with get_pricing().

Execute the following cell to align the factor data with the pricing data.

In [8]:
from alphalens.utils import get_clean_factor_and_forward_returns

factor_data = get_clean_factor_and_forward_returns(
    factor=pipeline_output, 
    prices=pricing_data,
    periods=(10, 21)
)

# Show the first 5 rows of merged_data
factor_data.head(5) 
Dropped 0.7% entries from factor data: 0.7% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
Out[8]:
10D 21D factor factor_quantile
date asset
2014-01-02 00:00:00+00:00 Equity(2 [HWM]) 0.005709 0.093962 -4.881690 1
Equity(24 [AAPL]) -0.001400 -0.095505 17.570883 4
Equity(31 [ABAX]) 0.107860 -0.045511 -0.222350 2
Equity(39 [DDC]) -0.006239 0.003442 33.137298 5
Equity(41 [ARCB]) 0.035144 0.019079 -3.880170 1

Visualize Results

Finally, execute the following cell to pass the output of get_clean_factor_and_forward_returns() to a function called create_full_tear_sheet(). This will create whats known as a tear sheet.

In [ ]:
from alphalens.tears import create_full_tear_sheet

create_full_tear_sheet(factor_data)
Quantiles Statistics
min max mean std count count %
factor_quantile
1 -100.000000 0.081227 -10.376945 10.200943 216611 20.017669
2 -4.623963 5.068143 0.691584 2.007197 216328 19.991516
3 1.337744 11.023441 6.129305 1.961863 216306 19.989483
4 7.670516 25.256888 14.784040 4.040048 216328 19.991516
5 21.465361 153701.600000 135.126575 2043.811693 216526 20.009814
Returns Analysis
10D 21D
Ann. alpha -0.145 -0.156
beta 0.080 0.154
Mean Period Wise Return Top Quantile (bps) -3.690 -7.395
Mean Period Wise Return Bottom Quantile (bps) -24.231 -23.717
Mean Period Wise Spread (bps) 20.540 15.992
<matplotlib.figure.Figure at 0x7f00cfc6d978>

That's It!

In the next lesson, we will show you how to interpret the charts produced by create_full_tear_sheet().

In [ ]: