Notebook

Fundamental factor models

By Evgenia "Jenny" Nitishinskaya, Delaney Granizo-Mackenzie, and Maxwell Margenot.

Part of the Quantopian Lecture Series:

Notebook released under the Creative Commons Attribution 4.0 License.

Fundamentals are data having to do with the asset issuer, like the sector, size, and expenses of the company. We can use this data to build a linear factor model, expressing returns on any asset as

$$R*t = a_t + b*{t1} F*1 + b*{t2} F*2 + \ldots + b*{tK} F_K + \epsilon_t$$

There are two different approaches to computing the factors $F*j$, which represent the returns associated with some fundamental characteristics, and the factor sensitivities $b*{tj}$.

Approach 1: Portfolio Construction

In the first, we start by representing each characteristic of interest by a portfolio: we sort all assets by that characteristic, then build the portfolio by going long the top quantile of assets and short the bottom quantile. The factor corresponding to this characteristic is the return on this portfolio. Then, the $b_{ij}$ are estimated for each asset $i$ by regressing over the historical values of $R_i$ and of the factors.

We'll use the canonical Fama-French factors for this example, which are the returns of portfolios constructred based on fundamental factors.

We start by getting the fundamentals data for all assets and constructing the portfolios for each characteristic:

Import some libraries.

In [1]:
import numpy as np
import pandas as pd
from quantopian.pipeline.data import morningstar
import statsmodels.api as sm
from statsmodels import regression
import matplotlib.pyplot as plt
import scipy.stats

Set the date range for which we want data.

In [2]:
start_date = '2011-1-1'
end_date = '2012-1-1'

Using the Pipeline API to Fetch Data

The pipeline API is a very useful tool for factor analysis. We use it here to get data for our analysis. Specifically, we want the daily values of book to price ratio and market cap for every security. But we also do several other useful filtering steps which are detailed in code comments.

In [3]:
import numpy as np
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import morningstar
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import CustomFactor, Returns

# Here's the raw data we need, everything else is derivative.

class MarketCap(CustomFactor):
    # Here's the data we need for this factor
    inputs = [morningstar.valuation.shares_outstanding, USEquityPricing.close]
    # Only need the most recent values for both series
    window_length = 1
    
    def compute(self, today, assets, out, shares, close_price):
        # Shares * price/share = total price = market cap
        out[:] = shares * close_price
        
        
class BookToPrice(CustomFactor):
    # pb = price to book, we'll need to take the reciprocal later
    inputs = [morningstar.valuation_ratios.pb_ratio]
    window_length = 1
    
    def compute(self, today, assets, out, pb):
        out[:] = 1 / pb
        
def make_pipeline():
    """
    Create and return our pipeline.
    
    We break this piece of logic out into its own function to make it easier to
    test and modify in isolation.
    
    In particular, this function can be copy/pasted into research and run by itself.
    """
    pipe = Pipeline()

    # Add our factors to the pipeline
    market_cap = MarketCap()
    # Raw market cap and book to price data gets fed in here
    pipe.add(market_cap, "market_cap")
    book_to_price = BookToPrice()
    pipe.add(book_to_price, "book_to_price")
    
    # We also get daily returns
    returns = Returns(inputs=[USEquityPricing.close], window_length=2)
    pipe.add(returns, "returns")
    
    # We compute a daily rank of both factors, this is used in the next step,
    # which is computing portfolio membership.
    market_cap_rank = market_cap.rank()
    pipe.add(market_cap_rank, 'market_cap_rank')
    
    book_to_price_rank = book_to_price.rank()
    pipe.add(book_to_price_rank, 'book_to_price_rank')

    # Build Filters representing the top and bottom 1000 stocks by our combined ranking system.
    biggest = market_cap_rank.top(1000)
    smallest = market_cap_rank.bottom(1000)
    
    highpb = book_to_price_rank.top(1000)
    lowpb = book_to_price_rank.bottom(1000)
    
    # Don't return anything not in this set, as we don't need it.
    pipe.set_screen(biggest | smallest | highpb | lowpb)
    
    # Add the boolean flags we computed to the output data
    pipe.add(biggest, 'biggest')
    pipe.add(smallest, 'smallest')
    
    pipe.add(highpb, 'highpb')
    pipe.add(lowpb, 'lowpb')
    
    return pipe
/build/src/ipython/IPython/kernel/__main__.py:11: NotAllowedInLiveWarning: The fundamentals attribute valuation.shares_outstanding is not yet allowed in broker-backed live trading

Now we initialize the pipeline.

In [4]:
pipe = make_pipeline()

We can visualize the dependency graph of our data computations here.

In [5]:
pipe.show_graph('png')
Out[5]:

This function will allow us to run the pipeline.

In [6]:
from quantopian.research import run_pipeline

Now let's actually run it and check out our results.

In [7]:
# This takes a few minutes.
results = run_pipeline(pipe, start_date, end_date)
results
Out[7]:
biggest book_to_price book_to_price_rank highpb lowpb market_cap market_cap_rank returns smallest
2011-01-03 00:00:00+00:00 Equity(2 [ARNC]) True 0.991867 3625.0 False False 1.573937e+10 4373.0 0.013083 False
Equity(21 [AAME]) False 2.052967 4526.0 True False 4.520648e+07 578.0 -0.009756 True
Equity(24 [AAPL]) True 0.167400 494.0 False True 2.957765e+11 4701.0 -0.003769 False
Equity(31 [ABAX]) False 0.257898 857.0 False True 6.002049e+08 2396.0 -0.020073 False
Equity(37 [ABCW]) False 0.277500 954.0 False True 2.601996e+07 370.0 0.000000 True
Equity(51 [ABL]) False 2.364066 4584.0 True False 2.415793e+05 2.0 -0.082353 True
Equity(53 [ABMD]) False 0.235200 760.0 False True 3.632957e+08 1985.0 -0.012346 False
Equity(58 [SERV]) False 1.666944 4405.0 True False 8.785592e+06 109.0 0.030172 True
Equity(62 [ABT]) True 0.297699 1040.0 False False 7.410661e+10 4626.0 0.007778 False
Equity(64 [ABX]) True 0.344994 1277.0 False False 5.297974e+10 4599.0 0.011217 False
Equity(67 [ADSK]) True 0.193900 599.0 False True 8.684483e+09 4138.0 -0.014952 False
Equity(76 [TAP]) True 0.857633 3270.0 False False 9.346545e+09 4175.0 -0.004362 False
Equity(88 [ACI]) True 0.460893 1821.0 False False 5.696619e+09 3993.0 -0.005954 False
Equity(100 [IEP]) False 1.077354 3838.0 True False 3.059538e+09 3653.0 0.006853 False
Equity(106 [ACU]) False 0.853315 3259.0 False False 2.914872e+07 403.0 -0.003141 True
Equity(107 [ACV]) True 0.360101 1338.0 False False 3.663793e+09 3768.0 -0.000270 False
Equity(112 [ACY]) False 1.571092 4353.0 True False 2.793301e+07 390.0 0.008357 True
Equity(114 [ADBE]) True 0.367202 1384.0 False False 1.565831e+10 4371.0 0.006869 False
Equity(117 [AEY]) False 0.949668 3506.0 False False 3.185216e+07 433.0 0.000000 True
Equity(122 [ADI]) True 0.301296 1058.0 False False 1.125325e+10 4246.0 -0.007638 False
Equity(128 [ADM]) True 0.829669 3196.0 False False 1.922429e+10 4421.0 0.007028 False
Equity(154 [AEM]) True 0.267001 904.0 False True 1.200374e+10 4266.0 0.002354 False
Equity(157 [AEG]) True 3.364738 4680.0 True False 1.065935e+10 4229.0 0.008210 False
Equity(161 [AEP]) True 0.798722 3103.0 False False 1.728033e+10 4399.0 -0.002495 False
Equity(166 [AES]) True 0.808669 3134.0 False False 9.602998e+09 4188.0 -0.002048 False
Equity(168 [AET]) True 0.843526 3232.0 False False 1.221105e+10 4277.0 0.002628 False
Equity(185 [AFL]) True 0.458695 1804.0 False False 2.660740e+10 4486.0 0.006238 False
Equity(197 [AGCO]) True 0.602011 2404.0 False False 4.714631e+09 3908.0 -0.002363 False
Equity(205 [AGN]) True 0.221602 705.0 False True 2.085580e+10 4439.0 0.000000 False
Equity(216 [HES]) True 0.683620 2718.0 False False 2.515630e+10 4474.0 0.001308 False
... ... ... ... ... ... ... ... ... ... ...
2012-01-03 00:00:00+00:00 Equity(41763 [TEA]) False 0.070500 147.0 False True 7.167431e+08 2593.0 0.020686 False
Equity(41765 [CHEF]) False 0.066000 138.0 False True 3.723197e+08 2050.0 0.043808 False
Equity(41766 [HZNP]) False 1.241773 3737.0 True False 7.811440e+07 927.0 -0.043062 True
Equity(41777 [LSG]) False 1.627869 4122.0 True False 5.038072e+08 2300.0 0.024390 False
Equity(41778 [ASM]) False 0.407498 1377.0 False False 3.821348e+07 558.0 0.021583 True
Equity(41789 [MNGL]) False 0.051900 117.0 False True 9.680125e+07 1066.0 0.011567 False
Equity(41820 [CARB]) False 0.109200 236.0 False True 2.789563e+08 1836.0 0.008174 False
Equity(41823 [PPP]) False 1.507613 4026.0 True False 2.823994e+08 1843.0 0.028939 False
Equity(41841 [PLMT]) False 1.228350 3723.0 True False 6.503190e+07 830.0 0.011881 True
Equity(41843 [ELLO]) False 1.403903 3939.0 True False 6.035624e+07 799.0 0.009009 True
Equity(41852 [OBT]) False 1.224290 3718.0 True False 1.118129e+07 145.0 -0.026385 True
Equity(41858 [NMAR]) False 0.090000 185.0 False True 5.748000e+07 770.0 NaN True
Equity(41872 [VER]) False 0.553710 1941.0 False False 7.535809e+07 912.0 0.000000 True
Equity(41886 [FNV]) True 0.459897 1589.0 False False 4.864339e+09 3897.0 -0.008592 False
Equity(41888 [CSRE]) False 1.857010 4234.0 True False 4.874792e+08 2262.0 0.049587 False
Equity(41893 [PBSK]) False 1.608234 4111.0 True False 3.682639e+07 534.0 0.011111 True
Equity(41915 [FSM]) False 0.272702 813.0 False True 6.749870e+08 2536.0 0.026316 False
Equity(42000 [ASBB]) False 1.042970 3397.0 False False 6.533924e+07 832.0 0.003258 True
Equity(42021 [XLS]) False 1.199328 3677.0 True False 1.675968e+09 3216.0 -0.008734 False
Equity(42023 [XYL]) True 0.442302 1516.0 False False 4.732375e+09 3888.0 -0.011565 False
Equity(42080 [BUR]) False 0.048100 109.0 False True NaN NaN NaN False
Equity(42112 [GEVA]) False 0.184101 470.0 False True 4.661304e+08 2230.0 -0.007092 False
Equity(42115 [TGD]) False 0.274100 819.0 False True 2.651027e+08 1806.0 0.010695 False
Equity(42118 [GRPN]) True NaN NaN False False 1.310047e+10 4256.0 -0.037488 False
Equity(42125 [VAC]) False 3.066544 4499.0 True False NaN NaN 0.002335 False
Equity(42151 [PACD]) False 1.252505 3756.0 True False 2.008800e+09 3344.0 0.036789 False
Equity(42165 [INVN]) False 0.092300 193.0 False True NaN NaN 0.001006 False
Equity(42166 [CLVS]) False 0.062900 134.0 False True NaN NaN -0.012614 False
Equity(42173 [DLPH]) False 0.219602 619.0 False True NaN NaN 0.005602 False
Equity(42184 [MFRM]) False 0.001900 11.0 False True NaN NaN 0.014880 False

767510 rows × 9 columns

Great, we have all the data. Now we need to compute the returns of our portfolios over time. We have the daily returns for each equity, plus whether or not that equity was included in any given portfolio on any given day. We can combine that information in the following way to yield daily portfolio returns.

Step 1: Subset our results into only data belonging to our 'biggest' portfolio.

In [8]:
results[results.biggest]
Out[8]:
biggest book_to_price book_to_price_rank highpb lowpb market_cap market_cap_rank returns smallest
2011-01-03 00:00:00+00:00 Equity(2 [ARNC]) True 0.991867 3625.0 False False 1.573937e+10 4373.0 0.013083 False
Equity(24 [AAPL]) True 0.167400 494.0 False True 2.957765e+11 4701.0 -0.003769 False
Equity(62 [ABT]) True 0.297699 1040.0 False False 7.410661e+10 4626.0 0.007778 False
Equity(64 [ABX]) True 0.344994 1277.0 False False 5.297974e+10 4599.0 0.011217 False
Equity(67 [ADSK]) True 0.193900 599.0 False True 8.684483e+09 4138.0 -0.014952 False
Equity(76 [TAP]) True 0.857633 3270.0 False False 9.346545e+09 4175.0 -0.004362 False
Equity(88 [ACI]) True 0.460893 1821.0 False False 5.696619e+09 3993.0 -0.005954 False
Equity(107 [ACV]) True 0.360101 1338.0 False False 3.663793e+09 3768.0 -0.000270 False
Equity(114 [ADBE]) True 0.367202 1384.0 False False 1.565831e+10 4371.0 0.006869 False
Equity(122 [ADI]) True 0.301296 1058.0 False False 1.125325e+10 4246.0 -0.007638 False
Equity(128 [ADM]) True 0.829669 3196.0 False False 1.922429e+10 4421.0 0.007028 False
Equity(154 [AEM]) True 0.267001 904.0 False True 1.200374e+10 4266.0 0.002354 False
Equity(157 [AEG]) True 3.364738 4680.0 True False 1.065935e+10 4229.0 0.008210 False
Equity(161 [AEP]) True 0.798722 3103.0 False False 1.728033e+10 4399.0 -0.002495 False
Equity(166 [AES]) True 0.808669 3134.0 False False 9.602998e+09 4188.0 -0.002048 False
Equity(168 [AET]) True 0.843526 3232.0 False False 1.221105e+10 4277.0 0.002628 False
Equity(185 [AFL]) True 0.458695 1804.0 False False 2.660740e+10 4486.0 0.006238 False
Equity(197 [AGCO]) True 0.602011 2404.0 False False 4.714631e+09 3908.0 -0.002363 False
Equity(205 [AGN]) True 0.221602 705.0 False True 2.085580e+10 4439.0 0.000000 False
Equity(216 [HES]) True 0.683620 2718.0 False False 2.515630e+10 4474.0 0.001308 False
Equity(239 [AIG]) True 1.587554 4368.0 True False 7.789643e+09 4100.0 0.001738 False
Equity(273 [ALU]) True 0.502790 2010.0 False False 6.666410e+09 4050.0 0.006826 False
Equity(300 [ALK]) True 0.530110 2129.0 False False 4.179994e+09 3845.0 -0.013041 False
Equity(328 [ALTR]) True 0.175399 527.0 False True 1.112724e+10 4241.0 -0.009460 False
Equity(337 [AMAT]) True 0.456809 1798.0 False False 1.864512e+10 4411.0 -0.006369 False
Equity(338 [BEAM]) True 0.614288 2448.0 False False 9.196057e+09 4161.0 -0.007411 False
Equity(351 [AMD]) True 0.123499 347.0 False True 5.582949e+09 3985.0 0.006020 False
Equity(353 [AME]) True 0.264201 888.0 False True 9.422944e+09 4179.0 -0.002034 False
Equity(357 [TWX]) True 1.008980 3676.0 False False 3.569727e+10 4537.0 0.003117 False
Equity(368 [AMGN]) True 0.483489 1907.0 False False 5.187034e+10 4597.0 -0.011345 False
... ... ... ... ... ... ... ... ... ... ...
2012-01-03 00:00:00+00:00 Equity(38949 [OIBR_C]) True 1.526485 4043.0 True False 3.714609e+09 3752.0 0.004894 False
Equity(38965 [FTNT]) True 0.085800 177.0 False True 3.359329e+09 3684.0 0.006925 False
Equity(39053 [CIT]) True 1.310616 3821.0 True False 6.994310e+09 4038.0 -0.007120 False
Equity(39073 [CIE]) True 0.507202 1764.0 False False 5.519195e+09 3958.0 -0.009591 False
Equity(39095 [CHTR]) True 0.151800 370.0 False True 6.260309e+09 4002.0 0.001408 False
Equity(39347 [ST]) True 0.184101 469.0 False True 4.627855e+09 3876.0 -0.006427 False
Equity(39495 [SDRL]) True 0.395507 1328.0 False False 1.469849e+10 4295.0 -0.001806 False
Equity(39499 [VIP]) True 0.789391 2720.0 False False 1.543534e+10 4306.0 0.002114 False
Equity(39546 [LYB]) True 0.743384 2578.0 False False 1.875532e+10 4355.0 -0.010360 False
Equity(39612 [SIX]) True 0.438404 1501.0 False False 4.537155e+09 3867.0 -0.003625 False
Equity(39778 [QEP]) True 0.566990 1986.0 False False 5.186463e+09 3935.0 0.002394 False
Equity(39994 [NXPI]) True 0.325098 1041.0 False False 3.794166e+09 3772.0 -0.002597 False
Equity(40338 [SMFG]) True 1.613424 4114.0 True False 3.888654e+10 4499.0 0.001821 False
Equity(40445 [LPLA]) True 0.391803 1313.0 False False 3.370944e+09 3685.0 0.002626 False
Equity(40573 [FRC]) True 0.657289 2318.0 False False 3.962769e+09 3801.0 -0.002603 False
Equity(40616 [MMI]) True 0.423603 1434.0 False False 1.161710e+10 4213.0 -0.000258 False
Equity(40755 [NLSN]) True 0.435294 1483.0 False False 1.062889e+10 4179.0 0.001350 False
Equity(40852 [KMI]) True 0.144699 351.0 False True 2.603239e+10 4425.0 0.019658 False
Equity(41047 [HCA]) True NaN NaN False False 9.621716e+09 4150.0 0.035228 False
Equity(41150 [APO]) True 0.089900 184.0 False True 4.505351e+09 3862.0 0.009764 False
Equity(41242 [ARCO]) True 0.140501 334.0 False True 4.301630e+09 3843.0 -0.005329 False
Equity(41416 [KOS]) True 0.183699 466.0 False True 4.787567e+09 3892.0 0.000815 False
Equity(41451 [LNKD]) True 0.063700 136.0 False True 6.066141e+09 3993.0 -0.011471 False
Equity(41462 [MOS]) True 0.467093 1621.0 False False 2.252842e+10 4393.0 0.003521 False
Equity(41484 [YNDX]) True 0.115400 258.0 False True 6.359126e+09 4005.0 0.009641 False
Equity(41491 [FSL]) True NaN NaN False False 3.108435e+09 3640.0 -0.023148 False
Equity(41636 [MPC]) True 0.844167 2881.0 False False 1.187205e+10 4221.0 0.001203 False
Equity(41886 [FNV]) True 0.459897 1589.0 False False 4.864339e+09 3897.0 -0.008592 False
Equity(42023 [XYL]) True 0.442302 1516.0 False False 4.732375e+09 3888.0 -0.011565 False
Equity(42118 [GRPN]) True NaN NaN False False 1.310047e+10 4256.0 -0.037488 False

253000 rows × 9 columns

Step 2: Get returns.

In [9]:
results[results.biggest]['returns']
Out[9]:
2011-01-03 00:00:00+00:00  Equity(2 [ARNC])          0.013083
                           Equity(24 [AAPL])        -0.003769
                           Equity(62 [ABT])          0.007778
                           Equity(64 [ABX])          0.011217
                           Equity(67 [ADSK])        -0.014952
                           Equity(76 [TAP])         -0.004362
                           Equity(88 [ACI])         -0.005954
                           Equity(107 [ACV])        -0.000270
                           Equity(114 [ADBE])        0.006869
                           Equity(122 [ADI])        -0.007638
                           Equity(128 [ADM])         0.007028
                           Equity(154 [AEM])         0.002354
                           Equity(157 [AEG])         0.008210
                           Equity(161 [AEP])        -0.002495
                           Equity(166 [AES])        -0.002048
                           Equity(168 [AET])         0.002628
                           Equity(185 [AFL])         0.006238
                           Equity(197 [AGCO])       -0.002363
                           Equity(205 [AGN])         0.000000
                           Equity(216 [HES])         0.001308
                           Equity(239 [AIG])         0.001738
                           Equity(273 [ALU])         0.006826
                           Equity(300 [ALK])        -0.013041
                           Equity(328 [ALTR])       -0.009460
                           Equity(337 [AMAT])       -0.006369
                           Equity(338 [BEAM])       -0.007411
                           Equity(351 [AMD])         0.006020
                           Equity(353 [AME])        -0.002034
                           Equity(357 [TWX])         0.003117
                           Equity(368 [AMGN])       -0.011345
                                                       ...   
2012-01-03 00:00:00+00:00  Equity(38949 [OIBR_C])    0.004894
                           Equity(38965 [FTNT])      0.006925
                           Equity(39053 [CIT])      -0.007120
                           Equity(39073 [CIE])      -0.009591
                           Equity(39095 [CHTR])      0.001408
                           Equity(39347 [ST])       -0.006427
                           Equity(39495 [SDRL])     -0.001806
                           Equity(39499 [VIP])       0.002114
                           Equity(39546 [LYB])      -0.010360
                           Equity(39612 [SIX])      -0.003625
                           Equity(39778 [QEP])       0.002394
                           Equity(39994 [NXPI])     -0.002597
                           Equity(40338 [SMFG])      0.001821
                           Equity(40445 [LPLA])      0.002626
                           Equity(40573 [FRC])      -0.002603
                           Equity(40616 [MMI])      -0.000258
                           Equity(40755 [NLSN])      0.001350
                           Equity(40852 [KMI])       0.019658
                           Equity(41047 [HCA])       0.035228
                           Equity(41150 [APO])       0.009764
                           Equity(41242 [ARCO])     -0.005329
                           Equity(41416 [KOS])       0.000815
                           Equity(41451 [LNKD])     -0.011471
                           Equity(41462 [MOS])       0.003521
                           Equity(41484 [YNDX])      0.009641
                           Equity(41491 [FSL])      -0.023148
                           Equity(41636 [MPC])       0.001203
                           Equity(41886 [FNV])      -0.008592
                           Equity(42023 [XYL])      -0.011565
                           Equity(42118 [GRPN])     -0.037488
Name: returns, dtype: float64

Step 3: Group by day and take the mean. This is pretty deep into pandas logic, so if you don't understand this on first pass it is recommended to check out pandas' documentation on all the functions used. Especially groupby, which is very useful. Keep in mind that the index in our results is a MultiIndex rather than a regular Index, that can complicate things.

In [10]:
results[results.biggest]['returns'].groupby(level=0).mean()
Out[10]:
2011-01-03 00:00:00+00:00   -0.000173
2011-01-04 00:00:00+00:00    0.010982
2011-01-05 00:00:00+00:00   -0.005017
2011-01-06 00:00:00+00:00    0.004201
2011-01-07 00:00:00+00:00   -0.003210
2011-01-10 00:00:00+00:00   -0.001821
2011-01-11 00:00:00+00:00    0.000640
2011-01-12 00:00:00+00:00    0.006782
2011-01-13 00:00:00+00:00    0.009905
2011-01-14 00:00:00+00:00   -0.000804
2011-01-18 00:00:00+00:00    0.005200
2011-01-19 00:00:00+00:00    0.004477
2011-01-20 00:00:00+00:00   -0.012755
2011-01-21 00:00:00+00:00   -0.006107
2011-01-24 00:00:00+00:00    0.000013
2011-01-25 00:00:00+00:00    0.007606
2011-01-26 00:00:00+00:00   -0.001106
2011-01-27 00:00:00+00:00    0.009396
2011-01-28 00:00:00+00:00    0.003167
2011-01-31 00:00:00+00:00   -0.017886
2011-02-01 00:00:00+00:00    0.009216
2011-02-02 00:00:00+00:00    0.016059
2011-02-03 00:00:00+00:00   -0.001468
2011-02-04 00:00:00+00:00    0.003518
2011-02-07 00:00:00+00:00    0.002991
2011-02-08 00:00:00+00:00    0.005747
2011-02-09 00:00:00+00:00    0.004054
2011-02-10 00:00:00+00:00   -0.004357
2011-02-11 00:00:00+00:00    0.001720
2011-02-14 00:00:00+00:00    0.006792
                               ...   
2011-11-18 00:00:00+00:00   -0.019297
2011-11-21 00:00:00+00:00   -0.000848
2011-11-22 00:00:00+00:00   -0.018459
2011-11-23 00:00:00+00:00   -0.004299
2011-11-25 00:00:00+00:00   -0.025292
2011-11-28 00:00:00+00:00   -0.002621
2011-11-29 00:00:00+00:00    0.033148
2011-11-30 00:00:00+00:00    0.001860
2011-12-01 00:00:00+00:00    0.045930
2011-12-02 00:00:00+00:00   -0.001984
2011-12-05 00:00:00+00:00   -0.000097
2011-12-06 00:00:00+00:00    0.012671
2011-12-07 00:00:00+00:00   -0.001904
2011-12-08 00:00:00+00:00    0.000533
2011-12-09 00:00:00+00:00   -0.025552
2011-12-12 00:00:00+00:00    0.018515
2011-12-13 00:00:00+00:00   -0.017489
2011-12-14 00:00:00+00:00   -0.014710
2011-12-15 00:00:00+00:00   -0.013625
2011-12-16 00:00:00+00:00    0.004174
2011-12-19 00:00:00+00:00    0.005895
2011-12-20 00:00:00+00:00   -0.015466
2011-12-21 00:00:00+00:00    0.032779
2011-12-22 00:00:00+00:00    0.001803
2011-12-23 00:00:00+00:00    0.010930
2011-12-27 00:00:00+00:00    0.007011
2011-12-28 00:00:00+00:00   -0.000239
2011-12-29 00:00:00+00:00   -0.015062
2011-12-30 00:00:00+00:00    0.011663
2012-01-03 00:00:00+00:00   -0.001328
Name: returns, dtype: float64

Now run through this computation for each portfolio and get our final results.

In [11]:
R_biggest = results[results.biggest]['returns'].groupby(level=0).mean()
R_smallest = results[results.smallest]['returns'].groupby(level=0).mean()

R_highpb = results[results.highpb]['returns'].groupby(level=0).mean()
R_lowpb = results[results.lowpb]['returns'].groupby(level=0).mean()

SMB = R_smallest - R_biggest
HML = R_highpb - R_lowpb

What were the daily returns?

In [12]:
plt.plot(SMB.index, SMB.values)
plt.ylabel('Daily Percent Return')
plt.legend(['SMB Portfolio Returns']);
In [13]:
plt.plot(HML.index, HML.values)
plt.ylabel('Daily Percent Return')
plt.legend(['HML Portfolio Returns']);

And what would it look like to hold these portfolios over time?

In [14]:
plt.plot(SMB.index, np.cumprod(SMB.values+1))
plt.ylabel('Cumulative Return')
plt.legend(['SMB Portfolio Returns']);

The last data we need are the daily returns on the broad market.

In [15]:
M = get_pricing('SPY', start_date='2011-1-1', end_date='2012-1-1', fields='price').pct_change()[1:]
In [16]:
plt.plot(M.index, M.values)
plt.ylabel('Daily Percent Return')
plt.legend(['Market Portfolio Returns']);

Actually Running the Regression

Now that we have returns series representing our factors, we can compute the factor model for any return stream using a linear regression. Below, we compute the factor sensitivities for returns on a tech portfolio.

In [17]:
# Get returns data for our portfolio
portfolio = get_pricing(['MSFT', 'AAPL', 'YHOO', 'FB', 'TSLA'], 
                        fields='price', start_date=start_date, end_date=end_date).pct_change()[1:]
R = np.mean(portfolio, axis=1)

Put all the data into one dataframe for convenience.

In [18]:
# Define a constant to compute intercept
constant = pd.TimeSeries(np.ones(len(R.index)), index=R.index)

df = pd.DataFrame({'R': R,
              'M': M,
              'SMB': SMB,
              'HML': HML,
              'Constant': constant})
df = df.dropna()

Perform the regression. You'll notice that these are the sensitivities over an entire year. It can be valuable to look at the rolling sensitivities as well to determine how stable they are.

In [19]:
# Perform linear regression to get the coefficients in the model
b1, b2, b3 = regression.linear_model.OLS(df['R'], df[['M', 'SMB', 'HML']]).fit().params

# Print the coefficients from the linear regression
print 'Historical Sensitivities of portfolio returns to factors:\nMarket: %f\nMarket cap: %f\nB/P: %f' %  (b1, b2, b3)
Historical Sensitivities of portfolio returns to factors:
Market: 0.962431
Market cap: -0.060328
B/P: -0.115476

Let's perform a rolling regression to look at how the estimated sensitivities change over time.

In [20]:
model = pd.stats.ols.MovingOLS(y = df['R'], x=df[['M', 'SMB', 'HML']], 
                             window_type='rolling', 
                             window=100)
rolling_parameter_estimates = model.beta
rolling_parameter_estimates.plot();
plt.title('Computed Betas');
plt.legend(['Market Beta', 'SMB Beta', 'HML Beta', 'Intercept']);

Approach 2: Factor Value Normalization

This is also known as cross-sectional factor analysis.

Another approach is to normalize factor values each bar and see how predictive of that bar's returns they were. We do this by computing a normalized factor value $b_{aj}$ for each asset $a$ in the following way.

$$b*{aj} = \frac{F*{aj} - \mu*{F_j}}{\sigma*{F_j}}$$

$F*{aj}$ is the value of factor $j$ for asset $a$ during this bar, $\mu*{F*j}$ is the mean factor value across all assets, and $\sigma*{F_j}$ is the standard deviation of factor values over all assets. Notice that we are just computing a z-score to make asset specific factor values comparable across different factors.

The exceptions to this formula are indicator variables, which are set to 1 for true and 0 for false. One example is industry membership: the coefficient tells us whether the asset belongs to the industry or not.

After we calculate all of the normalized scores during bar $t$, we can estimate factor $j$'s returns $F*{jt}$, using a cross-sectional regression (i.e. at each time step, we perform a regression using the equations for all of the assets). Specifically, once we have returns for each asset $R*{at}$, and normalized factor coefficients $b_{aj}$, we construct the following model and estimate the $F_j$s and $a_t$

$$R*{at} = a_t + b*{a1}F*1 + b*{a2}F*2 + \dots + b*{aK}F_K$$

You can think of this as slicing through the other direction from the first analysis, as now the factor returns are unknowns to be solved for, whereas originally the coefficients were the unknowns. Another way to think about it is that you're determining how predictive of returns the factor was on that day, and therefore how much return you could have squeezed out of that factor.

Following this procedure, we'll get the cross-sectional returns on 2011-01-03, and compute the coefficients for all assets:

Getting the Data

We already have the results of the previous pipeline call, so we can grab book to price information for 2011-1-3 pretty easily.

In [21]:
BTP = results['book_to_price']['2011-1-3']
zscore = (BTP - np.mean(BTP)) / np.std(BTP)
zscore.dropna(inplace=True)

plt.hist(zscore)
plt.xlabel('Z-Score')
plt.ylabel('Frequency');

Problem: The Data is Weirdly Distributed

Notice how there are big outliers in the dataset that cause the z-scores to lose a lot of information. Basically the presence of some huge book to price datapoints causes the rest of the data to seem to occupy a relatively small area. We need to get around this issue using some data cleaning technique, here we're use winsorization.

Winsorization

Winzorization takes the top $n\%$ of a dataset and sets it all equal to the least extreme value in the top $n\%$. For example, if your dataset ranged from 0-10, plus a few crazy outliers, those outliers would be set to 0 or 10 depending on their direction. Here is an example.

In [22]:
# Get some random data
X = np.random.normal(0, 1, 100)

# Put in some outliers
X[0] = 1000
X[1] = -1000

# Perform winsorization
print 'Before winsorization', np.min(X), np.max(X)
scipy.stats.mstats.winsorize(X, inplace=True, limits=0.01)
print 'After winsorization', np.min(X), np.max(X)
Before winsorization -1000.0 1000.0
After winsorization -3.38601416614 2.42976476762

This looks good, let's see how our book to price data looks when winsorized.

In [23]:
BTP = results['book_to_price']['2011-1-3']
scipy.stats.mstats.winsorize(BTP, inplace=True, limits=0.01)
BTP_z = (BTP - np.mean(BTP)) / np.std(BTP)
BTP_z.dropna(inplace=True)

plt.hist(BTP_z)
plt.xlabel('Z-Score')
plt.ylabel('Frequency');

We need the returns for that day as well.

In [24]:
R_day = results['returns']['2011-1-3']

Now set up our data and estimate $F_j$ using linear regression.

In [25]:
constant = pd.TimeSeries(np.ones(len(R_day.index)), index=R_day.index)

df_day = pd.DataFrame({'R': R_day,
              'BTP_z': BTP_z,
              'Constant': constant})
df_day = df_day.dropna()

# Perform linear regression to get the coefficients in the model
F1 = regression.linear_model.OLS(df_day['R'], df_day['BTP_z']).fit().params
print F1
BTP_z    0.002036
dtype: float64

Finally, let's add another factor so you can see how the code changes.

In [26]:
MKT = results['market_cap']['2011-1-3']
scipy.stats.mstats.winsorize(MKT, inplace=True, limits=0.01)
MKT_z = (MKT - np.mean(MKT)) / np.std(MKT)

constant = pd.TimeSeries(np.ones(len(R_day.index)), index=R_day.index)

df_day = pd.DataFrame({'R': R_day,
              'BTP_z': BTP_z,
              'MKT_z': MKT_z,
              'Constant': constant})
df_day = df_day.dropna()

# Perform linear regression to get the coefficients in the model
F1, F2 = regression.linear_model.OLS(df_day['R'], df_day[['BTP_z', 'MKT_z']]).fit().params
print F1, F2
0.00202044998408 0.000131289947406

To expand this analysis, you would simply loop through days, running this every day and getting an estimated factor return.

Using Fundamental Factor Modeling

Returns Prediction

As discussed in the Arbitrage Price Theory lecture, factor modeling can be used to predict future returns based on current fundamental factors, or to determine when an asset may be mispriced. Modeling future returns is accomplished by offsetting the returns in the regression, so that rather than predicted for current returns, you are predicting for future returns. Once you have a predictive model, the most canonical way to create a strategy is to attempt a long-short equity approach.

There is a full lecture describing long-short equity, but the general idea is that you rank equities based on their predicted future returns. You then long the top p% and short the bottom p% remaining neutral on dollar volume. If the assets at the top of the ranking on average tend to make $5\%$ more per year than the market, and assets at the bottom tend to make $5\%$ less, then you will make $(M + 0.05) - (M - 0.05) = 0.10$ or $10\%$ percent per year, where $M$ is the market return that gets canceled out.

Hedging out Exposure

Once we've determined that we are exposed to a factor, we may want to avoid depending on the performance of that factor by taking out a hedge. This is discussed in the Beta Hedging lecture and also in the Risk Factor Exposure notebook.

This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.