Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Beta calculation code. what is best, Standard, log or returns?

I found some beta calculation code what do you think is the method best used to develop low beta algos?


import scipy.stats as stats  
from scipy  import  polyfit, polyval  
import datetime  
import pytz  
import pandas as pd  
import numpy as np  
import re  
from pandas import DataFrame,Series  
from zipline.utils.tradingcalendar import get_early_closes  
from zipline.utils import tradingcalendar  
from datetime import timedelta  
import operator  
from functools import partial

def estimateBeta(priceY,priceX,algo = 'standard'):

    #estimate stock Y vs stock X beta using iterative linear  
   # regression. Outliers outside 3 sigma boundary are filtered out

   # Parameters  
   # --------  
    #priceX : price series of x (usually market)  
   # priceY : price series of y (estimate beta of this price)

   # Returns  
  #  --------  
    #beta : stockY beta relative to stock X  
    X = pd.DataFrame({'x':priceX,'y':priceY})

    if algo=='returns':  
        ret = (X/X.shift(1)-1).dropna().values  
        x = ret[:,0]  
        y = ret[:,1]  
        # filter high values  
        low = np.percentile(x,20)  
        high = np.percentile(x,80)  
        iValid = (x>low) & (x<high)  
        x = x[iValid]  
        y = y[iValid]  
        iteration = 1  
        nrOutliers = 1  
        while iteration < 10 and nrOutliers > 0 :  
            (a,b) = polyfit(x,y,1)  
            yf = polyval([a,b],x)  
            #plot(x,y,'x',x,yf,'r-')  
            err = yf-y  
            idxOutlier = abs(err) > 3*np.std(err)  
            nrOutliers =sum(idxOutlier)  
            beta = a  
            #print 'Iteration: %i beta: %.2f outliers: %i' % (iteration,beta, nrOutliers)  
            x = x[~idxOutlier]  
            y = y[~idxOutlier]  
            iteration += 1  
    elif algo=='log':  
        x = np.log(X['x'])  
        y = np.log(X['y'])  
        (a,b) = polyfit(x,y,1)  
        beta = a  
    elif algo=='standard':  
        ret =np.log(X).diff().dropna()  
        beta = ret['x'].cov(ret['y'])/ret['x'].var()  
    else:  
        raise TypeError("unknown Beta algorithm type, use 'standard', 'log' or 'returns'")

    return beta  
5 responses

Are you doing a regression on log prices? That's a spurious regression and don't actually indicate relationships. See: http://www.r-bloggers.com/spurious-regression-illustrated/

I don't think he is...? In the arithmetic returns, he's doing today/yesterday - 1, and in the log returns, he's doing log(today)-log(yesterday), both of which are correct.

As for which is better, there's a ton of back and forth:

https://quantivity.wordpress.com/2011/02/21/why-log-returns/
http://mathbabe.org/2011/08/30/why-log-returns/
http://www.wilmott.com/pdfs/011119_meucci.pdf

For calculating beta, I'm not sure it practically makes a difference... Meucci has a lot of other "nuggets" papers which cover tricky details of basic assumptions. Google meucci "quant nugget" for a list.

I was focusing on this part:

    elif algo=='log':  
        x = np.log(X['x'])  
        y = np.log(X['y'])  
        (a,b) = polyfit(x,y,1)  
        beta = a  

such that if algo == "log", it fits the log of the price for X and Y to the first degree.

Ah yeah I must have glazed over that, that doesn't look right.

Thanks guys. I got indeed spurious results from the log function, the other 2 behaved simular. I'll play around and post a new version so that this version is not used by others ;)