101 Alphas #2 with Parameter Optimization¶

$ (-1 * correlation(rank(delta(log(volume), 2)), rank(((close - open) / open)), 6)) $

This factor returns a negative value if the change in volume is highly correlated with intraday return. In other words, if volume increases (decreases) by a lot on days where the intraday return is high (low), this factor is negative.

I am postulating that the idea behind this factor is that large moves with heavy volume are liquidity demanding trades (ideally by uninformed traders). Traders providing liquidity in these instances would demand a premium/discount to take the other side to compensate for the risk that they may be trading with an informed trader or the risk of being stuck with an inventory too large. Note, this is quite the opposite of how technical analysis generally looks at the volume/price relationships (although I am oversimplifying a bit with this statement).

My in-sample data for this runs from 2003 to 2012. However, it should be noted that this paper was published in 2015. Therefore, any out-of-sample testing should be done on data after 2015, once the researcher gets to that stage. 2012 to 2015 could possibly be used as sort of a cross-validation set to tune hyper parameters if any kind of machine learning is used to tweak the factor.

Parameter Optimization¶

In this notebook, I will perform a bit of parameter optimization, in part to see what the best parameters or for performance. However, I am more interested in seeing how sensitive the performance of the factor is to changes in the input parameters. If performance is super sensitive to small changes in the inputs, then I would give a higher likelihood that the researchers overfit this factor.

To keep things simple for the moment, I will only adjust the correlation lookback window in the optimization. In the future, I may work on tweaking other parameters if I can find an efficient workflow for doing so.

# Typical imports for use with Pipeline
from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.research import run_pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.data import Fundamentals  
from quantopian.pipeline.classifiers.fundamentals import Sector 
from quantopian.pipeline.filters import QTradableStocksUS, Q500US

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import alphalens as al

class  VolumeChange(CustomFactor):
    """Factor returning the change in log volume as compared
    to (window_length - 1) days ago. Essentially, this is the
    percent change in volume."""
    inputs = [USEquityPricing.volume]
    window_length = 3
    window_safe=True
  
    def compute(self, today, asset_ids, out, volume):
        out[:] = np.log(volume[-1]) - np.log(volume[-3])
        
class IntradayReturn(CustomFactor):
    """Factor returning the return from today's open to 
    today's close"""
    inputs = [USEquityPricing.open, USEquityPricing.close]
    window_length = 1
    window_safe=True  
    def compute(self, today, asset_ids, out, open_, close):
        out[:] = close / open_ - 1

def make_alpha_2(mask, window_length=6):
    """Construct factor returning the negative of the rank correlation over the 
    past 'window_length' days between the intraday return and the VolumeChange.
    
    Parameters
    -----------
    mask: Filter
        Filter representing what assets get included in factor computation.
        
    Returns
    -------
    Factor
    
    Notes: This is a measure of whether returns are correlated with volume. It is
    negative when volume is stronger on up moves and light on down moves. It is 
    positive when volume is stronger on down moves and lighter on up moves.
        """
    class Alpha2(CustomFactor):
#         inputs = [VolumeChange().rank(), IntradayReturn().rank()]
#         window_length = 6

        def compute(self, today, asset_ids, out, volume_change, intraday_return):
            volume_change_df = pd.DataFrame(volume_change)
            intraday_return_df = pd.DataFrame(intraday_return)
            out[:]=-volume_change_df.corrwith(intraday_return_df)
        
    return Alpha2(mask=mask, 
                  inputs = [VolumeChange(mask=mask).rank(), 
                            IntradayReturn(mask=mask).rank()],
                  window_length=window_length
                 )

def make_pipeline(corr_param_range):
    base_universe = QTradableStocksUS()
#     base_universe = Fundamentals.symbol.latest.element_of(['GS', 'AAPL', 'XOM'])
    closed_end_funds = Fundamentals.share_class_description.latest.startswith('CE')
    universe = base_universe & ~closed_end_funds
    
    factor_dict = {}
    for i in corr_param_range:
        factor_dict['alpha_2_{}'.format(i)] = make_alpha_2(universe, i)

    factor_dict['sector_code'] = Sector(mask=universe)
    
    return Pipeline(columns=factor_dict, screen=universe)

start_date = '2003-01-01' 
end_date = '2012-12-31'
# end_date = '2003-01-10'
corr_param_range = [4,6,8,10,12,14,16,18,20]

result = run_pipeline(make_pipeline(corr_param_range), start_date, end_date, chunksize=504)  
col_order = []

# Reorder Columns
for i in corr_param_range:
    col_order.append('alpha_2_{}'.format(i))
col_order.append('sector_code')
result = result[col_order]

result.head()

Code to get `factor_data`¶

def get_al_prices(result, periods=(1,5,21)):
    assets = result.index.levels[1].unique()
    start_date = result.index.get_level_values(0)[0] 
    end_date = result.index.get_level_values(0)[-1]  + max(periods) * pd.tseries.offsets.BDay()
    pricing = get_pricing(assets, start_date, end_date, fields="open_price")
    return pricing 

def get_factor_data(result, 
                    factor_col, 
                    prices,
                    forward_returns,
                    quantiles=5,
                    bins=None, 
                    groupby=None, 
                    binning_by_group=False,
                    groupby_labels=None,
                    max_loss=0.35):

#     pricing = get_al_prices(result, periods)
    
#     factor_data = al.utils.get_clean_factor_and_forward_returns(factor=result[factor_col],
#                                                                 prices=pricing,
#                                                                 groupby=groupby,
#                                                                 binning_by_group=binning_by_group,
#                                                                 groupby_labels=groupby_labels,
#                                                                 quantiles=quantiles,
#                                                                 bins=bins,
#                                                                 periods=periods,
#                                                                 max_loss=max_loss)
    
    factor_data = al.utils.get_clean_factor(result[factor_col], 
                                            forward_returns,
                                            groupby=groupby,
                                            binning_by_group=binning_by_group,
                                            groupby_labels=groupby_labels,
                                            quantiles=quantiles,
                                            bins=bins,
                                            max_loss=max_loss)
    
    return factor_data

Optimize by Correlation Window¶

periods=(1,3,5,7,10,12,15,20)
prices = get_al_prices(result, periods)
forward_returns = al.utils.compute_forward_returns(result[result.columns[0]], prices, periods)

forward_returns.head()

# factor_data={}
ic_dict={}
for factor_col in result.columns:
    if factor_col != 'sector_code':
        print "-"*30 + "\nGetting Factor Data for '{}'".format(factor_col)
        factor_data = get_factor_data(result, 
                                      factor_col, 
                                      prices,
                                      forward_returns)
        print "-"*30 + "\nCalculating ICs for '{}'".format(factor_col)
        ic_dict[factor_col] = al.performance.mean_information_coefficient(factor_data)

------------------------------
Getting Factor Data for 'alpha_2_4'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_4'
------------------------------
Getting Factor Data for 'alpha_2_6'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_6'
------------------------------
Getting Factor Data for 'alpha_2_8'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_8'
------------------------------
Getting Factor Data for 'alpha_2_10'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_10'
------------------------------
Getting Factor Data for 'alpha_2_12'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_12'
------------------------------
Getting Factor Data for 'alpha_2_14'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_14'
------------------------------
Getting Factor Data for 'alpha_2_16'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_16'
------------------------------
Getting Factor Data for 'alpha_2_18'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_18'
------------------------------
Getting Factor Data for 'alpha_2_20'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_20'

ic_df = pd.DataFrame.from_dict(ic_dict)[col_order[:-1]]
ic_df

ic_df.loc['5D'].idxmax()

'alpha_2_16'

ic_df.plot();

import seaborn as sns

sns.heatmap(ic_df, annot=True, cmap='RdBu', vmin=-.01, vmax=.01)

<matplotlib.axes._subplots.AxesSubplot at 0x7f09b5c31ad0>

Tearsheet on Original Params¶

Correlation_window = 6 days

# prices, factor_data = get_factor_data(result, 'alpha_2')
factor_data = get_factor_data(result, 
                              ['alpha_2_6'], 
                              prices,
                              forward_returns)

Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

al.tears.create_full_tear_sheet(factor_data, long_short=True, group_neutral=False )

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7f09b59f3810>

Information Analysis

Turnover Analysis

Tearsheet on Optimized Params¶

Correlation_window = 16 Days

result.columns

Index([u'alpha_2_4', u'alpha_2_6', u'alpha_2_8', u'alpha_2_10', u'alpha_2_12',
       u'alpha_2_14', u'alpha_2_16', u'alpha_2_18', u'alpha_2_20',
       u'sector_code'],
      dtype='object')

factor_data = get_factor_data(result, 
                              ['alpha_2_16'], 
                              prices,
                              forward_returns)

Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

factor_data.head()

al.tears.create_full_tear_sheet(factor_data.drop(['3D', '7D', '12D', '15D', '20D'], axis=1), 
                                long_short=True, group_neutral=False )

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7f09eb518090>

Information Analysis

Turnover Analysis

		alpha_2_4	alpha_2_6	alpha_2_8	alpha_2_10	alpha_2_12	alpha_2_14	alpha_2_16	alpha_2_18	alpha_2_20	sector_code
2003-01-02 00:00:00+00:00	Equity(2 [ARNC])	0.202107	0.240665	-0.176664	-0.301140	-0.239590	-0.177081	-0.091721	-0.092662	-0.141505	101
	Equity(24 [AAPL])	-0.679728	-0.021000	0.214151	-0.009809	-0.014433	-0.156374	-0.180991	-0.132036	-0.109547	311
	Equity(41 [ARCB])	-0.989867	-0.535938	-0.544902	-0.224799	-0.152206	-0.169425	-0.057055	-0.071686	0.062592	310
	Equity(60 [ABS])	0.062825	0.045906	0.285614	0.223836	0.221915	0.022063	-0.078990	0.048286	-0.059117	205
	Equity(62 [ABT])	-0.483985	-0.017441	0.203240	0.254046	0.189048	0.085416	0.094814	-0.025710	-0.060899	206

		1D	3D	5D	7D	10D	12D	15D	20D
date	asset
2003-01-02 00:00:00+00:00	Equity(2 [ARNC])	0.022567	0.059515	-0.046867	-0.012123	-0.035583	-0.057731	-0.057731	-0.141545
	Equity(24 [AAPL])	0.030631	0.029928	0.018126	0.037516	-0.010538	-0.010538	-0.008431	-0.011803
	Equity(31 [ABAX])	0.041850	0.083150	0.069659	0.029460	0.026982	0.010738	-0.040198	0.000000
	Equity(39 [DDC])	0.010651	0.039744	0.014238	-0.021414	-0.004653	-0.011772	0.078480	0.065699
	Equity(41 [ARCB])	0.072531	0.086505	0.087304	0.076790	0.045382	-0.026395	-0.074927	-0.071422

	alpha_2_4	alpha_2_6	alpha_2_8	alpha_2_10	alpha_2_12	alpha_2_14	alpha_2_16	alpha_2_18	alpha_2_20
1D	0.003254	0.003735	0.004258	0.003816	0.004303	0.004392	0.004393	0.004439	0.004360
3D	0.004920	0.006562	0.006746	0.006705	0.007213	0.007156	0.007333	0.007258	0.007115
5D	0.005548	0.007183	0.007520	0.007788	0.008084	0.008186	0.008216	0.008079	0.007898
7D	0.005438	0.006946	0.007501	0.007732	0.008063	0.008078	0.008069	0.007813	0.007775
10D	0.004768	0.006439	0.006953	0.007176	0.007472	0.007451	0.007345	0.007350	0.007581
12D	0.004447	0.005994	0.006439	0.006692	0.006944	0.006888	0.006908	0.007069	0.007269
15D	0.003852	0.005273	0.005759	0.005897	0.006156	0.006289	0.006502	0.006769	0.007131
20D	0.003233	0.004494	0.004924	0.005311	0.005719	0.006072	0.006539	0.006968	0.007388

	min	max	mean	std	count	count %
factor_quantile
1	-1.000000	-0.307079	-0.672667	0.137022	917070	20.022300
2	-0.587819	0.005523	-0.324010	0.096262	915526	19.988590
3	-0.312260	0.255733	-0.045512	0.092879	915557	19.989267
4	-0.047081	0.539635	0.239701	0.100056	915526	19.988590
5	0.242650	1.000000	0.619084	0.153605	916564	20.011253

	1D	3D	5D	7D	10D	12D	15D	20D
Ann. alpha	0.010	0.014	0.013	0.013	0.011	0.010	0.008	0.006
beta	0.004	0.006	0.004	0.002	0.001	0.002	0.002	0.002
Mean Period Wise Return Top Quantile (bps)	0.548	0.804	0.732	0.690	0.570	0.474	0.348	0.257
Mean Period Wise Return Bottom Quantile (bps)	-0.416	-0.579	-0.543	-0.513	-0.510	-0.453	-0.386	-0.318
Mean Period Wise Spread (bps)	0.964	1.383	1.275	1.203	1.080	0.927	0.735	0.575

	1D	3D	5D	7D	10D	12D	15D	20D
IC Mean	0.004	0.007	0.007	0.007	0.006	0.006	0.005	0.004
IC Std.	0.036	0.036	0.035	0.035	0.035	0.035	0.034	0.034
Risk-Adjusted IC	0.104	0.184	0.205	0.197	0.184	0.173	0.154	0.131
t-stat(IC)	5.217	9.251	10.279	9.885	9.251	8.665	7.711	6.564
p-value(IC)	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
IC Skew	0.064	0.093	0.013	0.100	0.030	0.068	0.071	0.123
IC Kurtosis	0.539	0.813	0.651	0.297	0.097	0.163	0.323	0.277

	10D	12D	15D	1D	20D	3D	5D	7D
Quantile 1 Mean Turnover	0.800	0.800	0.801	0.341	0.802	0.597	0.743	0.798
Quantile 2 Mean Turnover	0.801	0.801	0.802	0.591	0.803	0.752	0.789	0.799
Quantile 3 Mean Turnover	0.801	0.801	0.802	0.636	0.803	0.773	0.800	0.800
Quantile 4 Mean Turnover	0.801	0.802	0.802	0.593	0.803	0.752	0.790	0.800
Quantile 5 Mean Turnover	0.799	0.801	0.801	0.341	0.802	0.596	0.741	0.797

	min	max	mean	std	count	count %
factor_quantile
1	-1.000000	-0.164926	-0.424189	0.120868	917135	20.022172
2	-0.364373	-0.010808	-0.192479	0.058487	915604	19.988748
3	-0.199384	0.141005	-0.039808	0.054075	915624	19.989185
4	-0.046081	0.300724	0.114688	0.058803	915604	19.988748
5	0.124331	1.000000	0.356780	0.128347	916630	20.011147

	1D	5D	10D
Ann. alpha	0.016	0.016	0.011
beta	0.004	0.004	0.006
Mean Period Wise Return Top Quantile (bps)	0.857	0.789	0.594
Mean Period Wise Return Bottom Quantile (bps)	-0.523	-0.701	-0.465
Mean Period Wise Spread (bps)	1.380	1.491	1.061