The notebook from the "Fundamental Factor Models" lecture uses a hybrid approach to calculating regressions. In any case, it's really a two step process. First, one calculates the series of values one wants to regress against (the independent variables). Next, one performs a regression against these values for each asset. The notebook uses pipeline to fetch the data, and also uses it to define the universes or groups of assets used when calculating the independent series values. The actual calculation of the independent series, as well as the regressions, are performed outside of pipeline. Pipeline is simply used to fetch a time series of all our data.
This is a very reasonable approach if one is doing research. Pipelines in notebooks output a multi-index dataframe with history data. However, this approach doesn't translate well to an algo. Pipelines in an algo return only a single-index dataframe with only the latest current data. In order to perform calculations (eg regressions) using historical data, those calculations must be done inside the pipeline.
How to move regressions into the pipeline definition? There happens to be a handy method just for this purpose linear_regression
(https://www.quantopian.com/docs/api-reference/pipeline-api-reference#zipline.pipeline.Factor.linear_regression). This method performs an ordinary least-squares regression and outputs 'alpha', 'beta', 'r_value', 'p_value', and 'stderr'. In this case we are looking for beta. As an example, if we want to regress a group of stocks returns against the returns of mean returns of the Q500US it could be done like this
# Define a 'slice'
slice = symbols('SPY')
# Define a universe
test_universe = Q1500US()
class Mean_For_Group(CustomFactor):
# Set window safe to allow output to be used as input to other factors
window_safe = True
def compute(self, today, assets, out, input_factor, group):
out[:] = input_factor[group].mean()
returns = Returns(window_length=2).log1p()
q500_mask = Q500US()
q500_mean_returns = Mean_For_Group(inputs=[returns, q500_mask], window_length=90)
q500_exposure = returns.linear_regression(target=q500_mean_returns[slice], regression_length=252, mask=test_universe).beta
Typically, one wants to see the correlation (actually beta) between the returns of a group of stocks and the returns of a fixed group of stocks. This would be similar to comparing ones returns to an 'index'. However, one could also look at the correlation between some fundamental value(s) using this same approach. In the lecture this is referred to as 'Factor Value Normalization'.
A word of caution. The linear_regression
method can be slow. It may time out when run over the entire QTradableStockUS universe. It's best to set up a very small test universe of 2-3 stocks using StaticAssets
or StaticSids
when testing to speed things up.