Quantopian Lecture Series: Multiple Linear Regression

Back to Community

posted

Multiple linear regression is probably the single most used technique in modern quantitative finance. To find out why check out our lectures on factor modeling and arbitrage pricing theory.

Multiple linear regression is just like single linear regression, except you can use many variables to predict one outcome and measure the relative contributions of each. This lecture covers the basics and deals with some of the issues common to the analysis.

All lectures can be found at: https://www.quantopian.com/lectures

Please reply with any feedback or questions.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

8 responses

Chris Das

Hi Delaney

In the MLR lecture I can't work out why running slr.params[1] is returning 2.78 but when printing summary.results() it shows X2 coef is 1.

Is the coef printed on the summary screen not reflecting the beta?

I noticed the same problem further on when running the MLR using a benchmark whereby using the results.summary () and mlr.summary () are showing different values for the X1 and X2 coef.

Additionally, the example at the end of the lecture using X1...X4 has the beta line up with the coef that is printed on results.summary (). Not sure why I'm getting conflicting results or if I'm simply interpreting it incorrectly.

Code for SLR:
start = '2014-01-01'
end = '2015-01-01'
asset1 = get_pricing('AAPL', fields='price', start_date=start, end_date=end)
asset2 = get_pricing('FISV', fields='price', start_date=start, end_date=end)
benchmark = get_pricing('SPY', fields='price', start_date=start, end_date=end)

slr = regression.linear_model.OLS(asset1, sm.add_constant(asset2)).fit()
print 'SLR beta of asset2:', slr.params[1]
print results.summary()

Code for MLR:
mlr = regression.linear_model.OLS(asset1, sm.add_constant(np.column_stack((asset2, benchmark)))).fit()

prediction = mlr.params[0] + mlr.params[1]*asset2 + mlr.params[2]*benchmark
prediction.name = 'Prediction'

print 'MLR beta of asset2:', mlr.params[1], '\nMLR beta of S&P 500:', mlr.params[2]
print results.summary()
mlr.summary()

Maxwell Margenot

Hi Chris,

If I am interpreting your issue correctly, it seems to be that the problem is with the regression that is stored in the results variable, right? We define results as a regression like so:

X1 = np.arange(100)  
X2 = np.array([i ** 2 for i in range(100)]) + X1  
Y = X1 + X2

X = sm.add_constant( np.column_stack( (X1, X2) ) )  
results = regression.linear_model.OLS(Y, X).fit()

So the coefficient of 1 makes sense, simply because this is a toy example. The slr and mlr regression variables contain examples with actual pricing data so it makes sense their summaries should be out of line with that of results. Hope this helps!

Disclaimer

Chris Das

Ohhh I see. If I understand, X1 and X2 in the results.summary printout actually represent the X1 and X2 dummy variables that we assigned earlier?

I had thought that they represented 'factor 1' and 'factor 2' in our mlr which would have been asset 2 and the benchmark respectively.

Maxwell Margenot

Precisely so, yes. If you want the factors from mlr, the multiple linear regression with actual data, those are represented by x1 and x2 in mlr.summary() and can be accessed more directly with mlr.params. The results variable only holds our dummy regression from before.

Disclaimer

Chris Das

Ok perfect thank you

Dinesh Bacham

How can one regress one non-stationary price time series on another. The results won;'t make sense

Deleted User

@Dinesh, absolutely. Unless they are cointegrated.

Delaney Mackenzie

That's right, the series must be stationary for the regression to give useful results. Note that this will not show in any of the statistics, in fact in the presence of non-stationarity confidence can be overestimated. We discuss this here: https://www.quantopian.com/lectures#Violations-of-Regression-Models

Disclaimer

You've successfully submitted a support ticket.

Our support team will be in touch soon.