Hi,
I understand the purpose behind the line (R.T - R.T.mean()).T.mean()) however I am confused as to what is in the rows/columns prior to this.
Let me know if I have misunderstood any of the following;
This adds market cap to the columns so rows = company name and columns = mkt cap.
pipe_columns = {
'Market Cap' : mkt_cap()
}
Shows only the top 500 mkt comanies above mkt cap or 1e8 (still rows = companies, columns = mkt cap)
context.output = pipeline_output('pipeline')
context.output[context.output['Market Cap'] > 1e8]
context.output.sort(['Market Cap'], ascending=False, inplace=True)
context.security_list = context.output.head(500).index
log.info(context.security_list)
Takes the log of prices for the top 500 mkt cap companies as per the above matrix
prices = np.log(data.history(context.security_list, 'price', context.lookback, '1d')).dropna(axis=1
Gives log returns over the return_window period
??Also what is the purpose of the second line? Doesn't the .dropna() perform the function of the np.isfinite?
R = (prices / prices.shift(context.return_window)).dropna()
R = R[np.isfinite(R[R.columns])].fillna(0)
My confusion is here
ranks = (R.T - R.T.mean()).T.mean()
My understanding is the companies are in the rows and log returns in columns. Transposing this wouldn't give a cross sectional average as the default axis of 0 would already give this without transposing. So my assumption is that I'm wrong about what is in the rows and columns but can't work out why.
Can anyone explain why log prices are in the rows and company names are in the columns (if that is the case)
Thanks