Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Trouble doing Fama MacBeth Regression in Algo

I am trying to build an algo that does a Fama MacBeth Regression, and then using the regression parameters to calculate expected returns of the stock list, and then go long the top half and short the bottom half. I took the Fama MacBeth formula from the "Fundamental Factor Models" lecture.

My problem is that the pipeline output in the IDE only returns a data frame for 1 day and does not have a multi-index, so this prevents the computation. How do I work around this? I want to perform the Fama MacBeth regression over the past 90 days of ROA, ROE, and pb_ratio numbers. Basically, how do you run a time series regression in the IDE using fundamentals as the independent variables?

I can't attach the IDE notebook since the backtest never worked, so I put the code into a normal notebook.

Thank you!

5 responses

Can someone please lend me their brain on this? I have tried everything. I've read through all of the new, and deprecated documentation, I've gone through the lectures, I've gone through the tutorials, I've looked through the forums trying to find a notebook where someone did something similar, and I haven't found anything. It seems like doing computations like this is just impossible within the IDE because of the way the pipeline is constructed.

The notebook from the "Fundamental Factor Models" lecture uses a hybrid approach to calculating regressions. In any case, it's really a two step process. First, one calculates the series of values one wants to regress against (the independent variables). Next, one performs a regression against these values for each asset. The notebook uses pipeline to fetch the data, and also uses it to define the universes or groups of assets used when calculating the independent series values. The actual calculation of the independent series, as well as the regressions, are performed outside of pipeline. Pipeline is simply used to fetch a time series of all our data.

This is a very reasonable approach if one is doing research. Pipelines in notebooks output a multi-index dataframe with history data. However, this approach doesn't translate well to an algo. Pipelines in an algo return only a single-index dataframe with only the latest current data. In order to perform calculations (eg regressions) using historical data, those calculations must be done inside the pipeline.

How to move regressions into the pipeline definition? There happens to be a handy method just for this purpose linear_regression (https://www.quantopian.com/docs/api-reference/pipeline-api-reference#zipline.pipeline.Factor.linear_regression). This method performs an ordinary least-squares regression and outputs 'alpha', 'beta', 'r_value', 'p_value', and 'stderr'. In this case we are looking for beta. As an example, if we want to regress a group of stocks returns against the returns of mean returns of the Q500US it could be done like this

    # Define a 'slice'  
    slice = symbols('SPY')

    # Define a universe  
    test_universe = Q1500US()

    class Mean_For_Group(CustomFactor):  
        # Set window safe to allow output to be used as input to other factors  
        window_safe = True

        def compute(self, today, assets, out, input_factor, group):  
            out[:] = input_factor[group].mean()

    returns = Returns(window_length=2).log1p()  
    q500_mask = Q500US()  
    q500_mean_returns = Mean_For_Group(inputs=[returns, q500_mask], window_length=90)

    q500_exposure = returns.linear_regression(target=q500_mean_returns[slice], regression_length=252, mask=test_universe).beta

Typically, one wants to see the correlation (actually beta) between the returns of a group of stocks and the returns of a fixed group of stocks. This would be similar to comparing ones returns to an 'index'. However, one could also look at the correlation between some fundamental value(s) using this same approach. In the lecture this is referred to as 'Factor Value Normalization'.

A word of caution. The linear_regression method can be slow. It may time out when run over the entire QTradableStockUS universe. It's best to set up a very small test universe of 2-3 stocks using StaticAssets or StaticSids when testing to speed things up.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you Dan! I will start playing around with this and see what I can come up with. Appreciate it!

Thank you for such a informative post!

@Dan, wondering if we can do linear regression with multiple explanatory variables? Thanks