Karol -- yes that's a good observation and certainly not the behavior one would expect.
I didn't want to go into the intricacies of applying this to real-world data so as to not distract from the theory. There are several issues here:
1. The universe is way too small. If you were to plot the stock returns the same way as in the bullet plot one would be in the upper left corner and appear to have much higher returns and low volatility. That's why the optimization always puts all it's weight behind all of them.
2. The mean is a terrible terrible estimator -- especially for stock returns. In practice you'll often find people to just heavily regularize the mean to some common value. In the extreme case you can just set the mean vector to be just 1s and essentially ignore their contribution. That way you'll get a weighting that's only trying to minimize the variance.
3. Covariance estimation is extremely noisy. df.cov() looks pretty harmless and you'd expect to get a clean covariance matrix but unfortunately it's very brittle. That's why you'll also want to apply regularization to the covariance matrix. See here for more details and tools to do that: http://scikit-learn.org/stable/modules/covariance.html
In case it would be useful to someone here are some routines I wrote to estimate the covariance matrix in a more robust way:
def cov2cor(X):
D = np.zeros_like(X)
d = np.sqrt(np.diag(X))
np.fill_diagonal(D, d)
DInv = np.linalg.inv(D)
R = np.dot(np.dot(DInv, X), DInv)
return R
def cov_robust(X):
oas = sklearn.covariance.OAS()
oas.fit(X)
return oas.covariance_
def corr_robust(X):
cov = cov_robust(X)
shrunk_corr = cov2cor(cov)
return pd.DataFrame(shrunk_corr, index=X.columns, columns=X.columns)
def is_pos_def(x):
return np.all(np.linalg.eigvals(x) > 0)
And here is an updated optimization function that takes a covariance matrix as well as allows you to shrink the means to 1:
def markowitz(returns, cov=None, shrink_means=False):
n = len(returns)
returns = np.asmatrix(returns)
N = 100
mus = [10**(5.0 * t/N - 1.0) for t in range(N)]
# Convert to cvxopt matrices
# minimize: w * mu*S * w + mean_rets * x
if cov is None:
S = opt.matrix(np.cov(returns))
else:
S = opt.matrix(cov)
if shrink_means:
pbar = opt.matrix(1., (n, 1))
else:
pbar = opt.matrix(np.mean(returns, axis=1))
# Create constraint matrices
# Gx < h: Every item is positive
G = -opt.matrix(np.eye(n)) # negative n x n identity matrix
h = opt.matrix(0.0, (n ,1))
# Ax = b sum of all items = 1
A = opt.matrix(1.0, (1, n))
b = opt.matrix(1.0)
if not shrink_means:
# Calculate efficient frontier weights using quadratic programming
portfolios = [solvers.qp(mu*S, -pbar, G, h, A, b)['x']
for mu in mus]
## CALCULATE RISKS AND RETURNS FOR FRONTIER
returns = [blas.dot(pbar, x) for x in portfolios]
risks = [np.sqrt(blas.dot(x, S*x)) for x in portfolios]
## CALCULATE THE 2ND DEGREE POLYNOMIAL OF THE FRONTIER CURVE
m1 = np.polyfit(returns, risks, 2)
x1 = np.sqrt(m1[2] / m1[0])
# CALCULATE THE OPTIMAL PORTFOLIO
wt = solvers.qp(opt.matrix(x1 * S), -pbar, G, h, A, b)['x']
else:
wt = solvers.qp(opt.matrix(S), -pbar, G, h, A, b)['x']
return np.asarray(wt).ravel()
Do you already regret asking? ;). Unfortunately those are the issues one has to deal with when actually applying clean theory to messy real-world data.