One can easily get the linear regression terms in Quantopian. The terms are the same as those returned by the scipy.stats.linregress
method. They are named a bit different though to allign with the conventional investing terms:
One can calculate the R Squared value as simply r_value ** 2
. I like R Squared and feel it's under appreciated.
The R Squared value or "Coefficient of Determination" is the proportion of the variance in a dependent variable that is predictable from an independent variable. It a measure of how correlated two series of values are.
Let's see how to use this...
# First, we need to import the basic pipeline methods
from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.research import run_pipeline
# Also get the built-in filters and/or factors to use
from quantopian.pipeline.filters import QTradableStocksUS, Q500US, StaticAssets
from quantopian.pipeline.factors import Returns
# Finally get any data we want to use
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.domain import US_EQUITIES
# Import numpy and pandas because they rock
import numpy as np
import pandas as pd
One very common use of linear regression is to verify how strong a correlation exists between a given stock return and the market return.
Let's find Alpha, Beta, and R Squared or how correlated the stocks in the Q500US universe are with the market in general. We'll use SPY as a proxy for market returns. We'll also use the built in factor method linear_regression
(https://www.quantopian.com/docs/api-reference/pipeline-api-reference#zipline.pipeline.Factor.linear_regression).
def make_spy_correlation_pipeline():
# Universe we wish to trade
my_universe = Q500US()
spy = symbols('SPY')
# Ensure we include SPY in our universe
total_universe = my_universe | StaticAssets([spy])
# Create any needed factors.
# Get the 5 day returns for each asset (including SPY)
returns = Returns(window_length=5)
# Now make a 'slice' of data representing just the returns of SPY
spy_returns = returns[spy]
# Use the 'linear_regression' method to get all the regression attributes
# for each asset returns vs SPY returns.
# Check the regression over the past quarter (about 63 trading days)
# We don't really need to use a mask but it sometimes speeds things up
regression = returns.linear_regression(target=spy_returns, regression_length=63, mask=total_universe)
# Create any filters or signals based upon these factors
pass
# Create our pipeline
# The regression factor has mutiple outputs. Use dot notation to access each separately
pipe = Pipeline(
columns={
'returns': returns,
'alpha': regression.alpha,
'beta': regression.beta,
'r_value': regression.r_value,
'p_value': regression.p_value,
'stderr': regression.stderr,
'r_squared': regression.r_value ** 2,
},
screen=total_universe,
)
return pipe
start = '2019-10-28'
end = '2019-10-28'
results = run_pipeline(make_spy_correlation_pipeline(), start, end)
results.head()
# To verify, let's check the values for SPY. Beta and R Squared should be 1 and alpha should be 0.
results.xs(symbols('SPY'), level=1)