PCA explained variance problem

Quantopian's community platform is shutting down. Please read this post for more information and download your code.

Back to Community

PCA explained variance problem

posted Oct 1, 2018

Hi all!
I am fairly new to Quantopian. I am trying to use a PCA to get statistical risk factors for the returns of SP500. Attached is a very simple code I have which works, however the results puzzle me. Isn't the 9% explained variance by first PC way too little? I mean CAPM has a R^2 generally around 50-60%, which is basically the amount of explained variance as far as my understanding goes. So shouldn't the first PC explaine AT LEAST as much considering that by construction it represents linear combinations of the underlying in the direction of greatest variance in the data?

Any feedback would be deeply appreciated!

1 response

Michael Matthews

Hi Gregor,

I am not an expert on PCA, but I was curious myself, and looked into your question, so here are my thoughts. If you look at the formula for calculating explained variance in PCA, it is very similar to that of R-squared. However, the setup of the problem is a bit different. In linear regression, there is a dependent variable of which we are trying to explain the variance with given input features. In PCA, there is no dependent variable. We simply have a bunch of features/variables (in your example, the variables are individual stock returns). PCA is a form of dimensionality reduction. In other words, it tries to reduce the amount of variables we have while maintaining as much "information" (aka variance) as possible. (You may already know this so I apologize if I gave too much information).

I tried experimenting by modifying your notebook a little bit. The first thing I noticed was that you were only using 63 days of data because you dropped any rows where stocks did not have returns data available. I added more stocks and more days while filling in NaNs with 0 values. (I don't know if this is the best way to deal with this, but it seemed to work for now).

To begin with, I calculate the first principal component. The explained variance is a bit higher (21%) than the 9% in your example (probably just due to having more data). I then calculated factor returns using the 1st principal component. Next, I ran some regressions on the resulting factor returns vs. a random stock's returns (you can easily change the stock if you'd like). Note: The r-squared value of the regression depends on the stock you choose.

In my example, I used ticker symbol "GS". The r-squared using the PCA factor was 48.7%. The r-squared from regressing "GS" vs. "SPY" was about 54%. Also, if you regress SPY returns vs. the PCA factor returns, the r-squared was about 74% (i.e. a correlation of about 0.86 between SPY and the PCA factor). Therefore, it does appear that the PCA methodology is picking up on this "market factor".

That is just my 2 cents. I would be interested to hear from others more experienced in the matter.

You've successfully submitted a support ticket.

Our support team will be in touch soon.

Need help? Contact support.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian.

In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

About Quantopian

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian.

In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.