Notebook

Getting the linear regression terms including R Squared

One can easily get the linear regression terms in Quantopian. The terms are the same as those returned by the scipy.stats.linregress method. They are named a bit different though to allign with the conventional investing terms:

  • 'alpha' (intercept)
  • 'beta' (slope)
  • 'r_value'
  • 'p_value'
  • 'stderr'

One can calculate the R Squared value as simply r_value ** 2 . I like R Squared and feel it's under appreciated.

The R Squared value or "Coefficient of Determination" is the proportion of the variance in a dependent variable that is predictable from an independent variable. It a measure of how correlated two series of values are.

Let's see how to use this...

In [42]:
# First, we need to import the basic pipeline methods
from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.research import run_pipeline

# Also get the built-in filters and/or factors to use
from quantopian.pipeline.filters import QTradableStocksUS, Q500US, StaticAssets
from quantopian.pipeline.factors import Returns

# Finally get any data we want to use
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.domain import US_EQUITIES

# Import numpy and pandas because they rock
import numpy as np
import pandas as pd

One very common use of linear regression is to verify how strong a correlation exists between a given stock return and the market return.

Let's find Alpha, Beta, and R Squared or how correlated the stocks in the Q500US universe are with the market in general. We'll use SPY as a proxy for market returns. We'll also use the built in factor method linear_regression (https://www.quantopian.com/docs/api-reference/pipeline-api-reference#zipline.pipeline.Factor.linear_regression).

In [45]:
def make_spy_correlation_pipeline():
    # Universe we wish to trade
    my_universe = Q500US()
    spy = symbols('SPY')
    
    # Ensure we include SPY in our universe
    total_universe = my_universe | StaticAssets([spy])

    # Create any needed factors.
    # Get the 5 day returns for each asset (including SPY)
    returns = Returns(window_length=5)
    
    # Now make a 'slice' of data representing just the returns of SPY
    spy_returns = returns[spy]
    
    # Use the 'linear_regression' method to get all the regression attributes
    # for each asset returns vs SPY returns.
    # Check the regression over the past quarter (about 63 trading days)
    # We don't really need to use a mask but it sometimes speeds things up
    regression = returns.linear_regression(target=spy_returns, regression_length=63, mask=total_universe)

    # Create any filters or signals based upon these factors
    pass

    # Create our pipeline
    # The regression factor has mutiple outputs. Use dot notation to access each separately
    pipe = Pipeline(
        columns={
            'returns': returns,
            'alpha': regression.alpha,
            'beta': regression.beta,
            'r_value': regression.r_value,
            'p_value': regression.p_value,
            'stderr': regression.stderr,
            'r_squared': regression.r_value ** 2,
        },
        screen=total_universe,
    )
    
    return pipe 
In [46]:
start = '2019-10-28'
end = '2019-10-28'
results = run_pipeline(make_spy_correlation_pipeline(), start, end)

results.head()

Pipeline Execution Time: 1.01 Seconds
Out[46]:
alpha beta p_value r_squared r_value returns stderr
2019-10-28 00:00:00+00:00 Equity(2 [ARNC]) 0.004761 1.567897 9.274394e-11 0.500099 0.707177 0.021820 0.200709
Equity(24 [AAPL]) 0.010246 1.259906 2.195313e-14 0.618550 0.786480 0.025153 0.126679
Equity(53 [ABMD]) -0.024554 2.989382 6.551612e-10 0.467558 0.683782 0.036132 0.408446
Equity(62 [ABT]) -0.005026 0.892892 3.530957e-14 0.612618 0.782699 0.000000 0.090909
Equity(67 [ADSK]) -0.009767 1.524590 1.890722e-08 0.406768 0.637784 0.028693 0.235737
In [48]:
# To verify, let's check the values for SPY. Beta and R Squared should be 1 and alpha should be 0.
results.xs(symbols('SPY'), level=1)
Out[48]:
alpha beta p_value r_squared r_value returns stderr
2019-10-28 00:00:00+00:00 0.0 1.0 0.0 1.0 1.0 0.005299 0.0
In [ ]: