Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Stability of Principal Components

This algorithm performs a principal components analysis on a group of five of the largest market cap stocks in the S&P 500. We plot the elements of the first eigenvector of the covariance matrix. The plot will give us an idea of how stable the correlations between assets are over time. We look for the values in the principal component to be relatively stable over time. The first principal component is highly correlated with the market and can give us an idea of how the assets in the universe vary both relative to the market and to each other.

http://en.wikipedia.org/wiki/Principal_component_analysis

5 responses

Now see the same algo applied to the second principal component. While the first principal component is highly correlated with the market, the second is not. The first principal component of equity returns can represent a systematic (or non-diversifiable) risk, while the second principal component is mostly idiosyncratic risk. Notice that the prices of stocks in the same sector tend to move in opposite directions (i.e. the values for GOOG and AAPL have opposite signs). Since Google and Apple are direct competitors, if one of them captures more of a market, the other loses some of it.

Now see a plot that shows the stability of the eigenvalues themselves. Each line represents the proportion of variability explained by each of the principal components. These proportions are relatively stable, although there are a few spikes.

Ryan this is a new perspective for me on the use of PCA. I’m more familiar with it in the context of dimensionality reduction for cluster analysis. How did you decide to keep all 4 principal components in the model? Or was this just for demonstration?

In the plot of the eigenvalues themselves you mention that the proportions are relatively stable. In 3/2/2009 the 1st PC accounts for 94% of the variability and 2/18/2013 44%. What are you using to measure the stability? The graph doesn’t suggest this to me.
Another way to show the correlation would be to plot the change in correlation coefficient for each stock in the portfolio rather than the 1st PC, which I think would highlight the potential issue of multicollinearity.

I agree with your statement of the relationship between AAPL and GOOG, which we can observe in the plot of the 2nd PC. This nicely captures the relationship with the 1st PC (and as you say, the systemic market risk) excluded. I look forward to experimenting with this algo; thanks!

Hey Timothy - PCA is widely used in many areas of statistics, science, and engineering in addition to economics of finance. The choice of four principal components is mostly for demonstration. The 'tail' principal components with have less explanatory power as their eigenvalues become small. The smaller the eigenvalue, the less variability is explained by the PC.

I am not 'measuring' stability, just making an observation looking at the graph. The proportions seem to be more stable than the eigenvectors, but you are right, there is considerable variability in the proportions. Can you speak more to how the results may be affected by multicollinearity?

By "proportions" I understood this to be the proportion of variability explained by the component. Do you mean that the component values proportional to one another are stable?

The multicollinearity issue I mention is that intuitively one can expect that there might be high correlation among the predictor variables, especially when they represent such a large proportion of the market cap. For example, a linear regression with the S&P as the response variable and the above five stocks, will likely build a model where each predictor variable is statistically significant. However, a linear model of any of the individual stocks could also be constructed using the remaining four stocks. (GE might be the one stock that is the exception) We might observe this directly through a scatterplot matrix, but I think we are also seeing it in the first plot you posted, of the elements.