The purpose of this notebook is to provide a set of functions to allow the user to explore how a combination of uncorrelated and possibly "interacting" factors can result in an enhanced signal. It also provides some tools to detect interaction effects between factors. <br><br>
Note: The following link does a good job of illustrating how to interpret factor interaction plots (although it is in a completely different context from the finance field). https://courses.washington.edu/smartpsy/interactions.htm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.cm as cm
from scipy import stats
import alphalens as al
These will be used later when analyzing the combination of factors.
def mean_return_by_quantile(factor_data,
by_date=False,
by_group=False,
demeaned=True,
group_adjust=False,
factor_groupers=['factor_quantile']):
"""
Computes mean returns for factor quantiles across
provided forward returns columns.
Parameters
----------
factor_data : pd.DataFrame - MultiIndex
A MultiIndex DataFrame indexed by date (level 0) and asset (level 1),
containing the values for a single alpha factor, forward returns for
each period, the factor quantile/bin that factor value belongs to, and
(optionally) the group the asset belongs to.
- See full explanation in utils.get_clean_factor_and_forward_returns
by_date : bool
If True, compute quantile bucket returns separately for each date.
by_group : bool
If True, compute quantile bucket returns separately for each group.
demeaned : bool
Compute demeaned mean returns (long short portfolio)
group_adjust : bool
Returns demeaning will occur on the group level.
factor_groupers: list
list of column names (strings) for the factor quantiles to group by
Returns
-------
mean_ret : pd.DataFrame
Mean period wise returns by specified factor quantile.
std_error_ret : pd.DataFrame
Standard error of returns by specified quantile.
"""
if group_adjust:
grouper = [factor_data.index.get_level_values('date')] + ['group']
factor_data = al.utils.demean_forward_returns(factor_data, grouper)
elif demeaned:
factor_data = al.utils.demean_forward_returns(factor_data)
else:
factor_data = factor_data.copy()
grouper = factor_groupers
if by_date:
grouper.append(factor_data.index.get_level_values('date'))
if by_group:
grouper.append('group')
group_stats = factor_data.groupby(grouper)[
al.utils.get_forward_returns_columns(factor_data.columns)] \
.agg(['mean', 'std', 'count'])
mean_ret = group_stats.T.xs('mean', level=1).T
std_error_ret = group_stats.T.xs('std', level=1).T \
/ np.sqrt(group_stats.T.xs('count', level=1).T)
return mean_ret, std_error_ret
def plot_multi_factor_quantile_returns(mean_ret_by_quantile, period, ax=None):
"""
Plots mean period wise returns for factor quantiles.
Parameters
----------
mean_ret_by_q : pd.DataFrame
DataFrame with quantiles, (group) and mean period wise return values.
period: pandas.Timedelta or string
Length of period for which the returns are computed (e.g. 1 day)
if 'period' is a string it must follow pandas.Timedelta constructor
format (e.g. '1 days', '1D', '30m', '3h', '1D1h', etc)
ax : matplotlib.Axes, optional
Axes upon which to plot.
Returns
-------
ax : matplotlib.Axes
"""
if ax is None:
fig, ax = plt.subplots(figsize=(6, 6))
sns.heatmap(mean_ret_by_quantile[period].unstack(), annot=True,
cmap=cm.coolwarm_r, ax=ax, center=0)
ax.set(title="Mean {} Returns".format(period))
else:
sns.heatmap(mean_ret_by_quantile[period].unstack(), annot=True,
cmap=cm.coolwarm_r, ax=ax, center=0)
ax.set(title="Mean {} Returns".format(period))
Let's generate some randomly uncorrelated factor values in addition to a return stream that is a function of the factor values and their interaction.
$ r = \beta_1f_1 + \beta_2f_2 + \beta_3f_1f_2 + \epsilon $
def generate_factor(n_stocks, n_periods):
"""Generate random factor values for given number of stocks and periods
Parameters
-----------
n_stocks: int
Number of stocks in simulation
n_periods: int
Number of days
Return
-------
pd.Series
Multi-index series of factor values (index by date, then asset)
"""
factor = np.random.normal(0, size=n_stocks * n_periods)
date_idx = pd.DatetimeIndex(start='2003-01-01', periods=n_periods, freq='B')
idx = pd.MultiIndex.from_product([date_idx, (range(n_stocks))])
factor = pd.Series(factor, idx)
factor.index.names=['date', 'asset']
return factor
def generate_simulated_returns(factor_1, factor_2, factor_1_coef, factor_2_coef, interaction_coef):
"""Generate simulated returns as a function of the factor_1, factor_2, and factor_1*factor_2
values.
Parameters
----------
factor_1, factor_2: pd.Series
Series indexed by date, then asset containing factor values
factor_1_coef, factor_2_coef, interaction_coef: float
The "True" Factor loadings for the simulated return stream.
Returns
-------
pd.Series:
Daily return series indexed by date and then asset
"""
ret = (factor_1_coef * factor_1) + (factor_2_coef * factor_2) + \
(interaction_coef * factor_1*factor_2)
noise = np.random.normal(0,0.02, size=len(factor_1))
ret = ret + noise
return pd.Series(ret, index=factor_1.index)
def simulate_and_plot_results(n_stocks, n_periods, factor_1_coef, factor_2_coef, interaction_coef):
"""Perform Entire Simulation and Plot Results in One Step"""
factor_1 = generate_factor(N_STOCKS, N_PERIODS)
factor_2 = generate_factor(N_STOCKS, N_PERIODS)
sim_returns = generate_simulated_returns(factor_1, factor_2, factor_1_coef, factor_2_coef, interaction_coef)
factor_data_1 = al.utils.get_clean_factor(factor_1, pd.DataFrame({'1D': sim_returns}))
factor_data_2 = al.utils.get_clean_factor(factor_2, pd.DataFrame({'1D': sim_returns}))
factor_data_1.rename(columns={'factor': 'factor_1', 'factor_quantile': 'factor_1_quantile'}, inplace=True)
factor_data_2.rename(columns={'factor': 'factor_2', 'factor_quantile': 'factor_2_quantile'}, inplace=True)
multi_factor_data = factor_data_1.join(factor_data_2[['factor_2', 'factor_2_quantile']])
mean_ret_by_q = mean_return_by_quantile(multi_factor_data,
factor_groupers=['factor_1_quantile','factor_2_quantile'])[0]
print "------------------------------------------------------------"
print "Mean Return by Factor Quantile for Each Factor Individually"
print "-----------------------------------------------------------"
print mean_ret_by_q.groupby(level=0).mean(), '\n', mean_ret_by_q.groupby(level=1).mean()
print "------------------------------------------------------------"
print "Mean 1 Day Returns for Each Factor 1 and 2 Quantile Intersection"
print "----------------------------------------------------------------"
plot_multi_factor_quantile_returns(mean_ret_by_q, '1D', ax=None)
mean_ret_by_q['1D'].unstack().plot(title='Factor Interaction Plot')
plt.gca().set_ylabel('Return');
Let's choose some parameters for our simulation and then generate the simulated factor values and stock returns. For this simulation, I am going to set the factor_2 coefficient to 0. In other words, the value of factor 2 will not be predictive of future returns. However, the interaction coefficient will have a positive loading.
N_STOCKS = 5
N_PERIODS = 1000
FACTOR_1_COEF = 0.05
FACTOR_2_COEF = 0
INTERACTION_COEF = 0.05
factor_1 = generate_factor(N_STOCKS, N_PERIODS)
factor_2 = generate_factor(N_STOCKS, N_PERIODS)
sim_returns = generate_simulated_returns(factor_1, factor_2, FACTOR_1_COEF, FACTOR_2_COEF, INTERACTION_COEF)
This is simply exploratory, just to see what kind of return distribution was generated. There appears to be some positive excess kurtosis.
fig, ax = plt.subplots(ncols=2)
ax[0].hist(sim_returns, bins=30);
ax[0].set(title='Distribution of Simulated Returns')
stats.probplot(sim_returns, plot=ax[1])
stats.describe(sim_returns)
Let's verify that the correlation between factors is zero.
h = sns.jointplot(factor_1, factor_2, annot_kws={'title': 'Factor Correlation'});
h.set_axis_labels('factor_1', 'factor_2', fontsize=16);
As opposed to generating alphalens output for these, I just decided to show a simple scatter plot for each factor and factor interaction to show whether each factor (or interaction term) was predictive of future returns on its own.
fig, axes = plt.subplots(nrows=2, ncols=2)
for factor, ax, title in zip([factor_1, factor_2, factor_1*factor_2] ,axes.flat,
['Factor 1', 'Factor 2', 'Interaction']):
sns.regplot(factor, sim_returns, ax=ax)
ax.set(title=title, xlabel='Factor Value', ylabel='Return')
fig.tight_layout()
factor_data_1 = al.utils.get_clean_factor(factor_1, pd.DataFrame({'1D': sim_returns}))
factor_data_2 = al.utils.get_clean_factor(factor_2, pd.DataFrame({'1D': sim_returns}))
factor_data_1.rename(columns={'factor': 'factor_1', 'factor_quantile': 'factor_1_quantile'}, inplace=True)
factor_data_2.rename(columns={'factor': 'factor_2', 'factor_quantile': 'factor_2_quantile'}, inplace=True)
multi_factor_data = factor_data_1.join(factor_data_2[['factor_2', 'factor_2_quantile']])
multi_factor_data.head()
mean_ret_by_q = mean_return_by_quantile(multi_factor_data,
factor_groupers=['factor_1_quantile','factor_2_quantile'])[0]
print "Mean Return by Factor Quantile for Each Factor Individually"
print "-----------------------------------------------------------"
print mean_ret_by_q.groupby(level=0).mean(), '\n', mean_ret_by_q.groupby(level=1).mean()
print "Mean 1 Day Returns for Each Factor 1 and 2 Quantile Intersection"
print "----------------------------------------------------------------"
plot_multi_factor_quantile_returns(mean_ret_by_q, '1D', ax=None)
This plot illustrates the heat map above in a different type of visualization. The key in this plot is to look at how the slope of the line changes as we alter the factor_2_quantile variable. Since the slope changes as we change the factor_2_quantile, it suggests that there is an "non-additive" interaction between factor 1 and factor 2. In fact, this is a case where factor 2 had no predictive ability by itself. However, when combined with factor 1 it actually can enhance the predictability of the entire model.
mean_ret_by_q['1D'].unstack().plot(title='Factor Interaction Plot');
plt.ylabel('Return');
Note: If it was desirable to elminate the individual factor exposure and only have exposure to the "interaction factor", it might make sense to neutralize your factor exposure by going long the (Q5, Q5) and (Q1, Q1) bins while going short the (Q1, Q5) and (Q5, Q1) bins.
N_STOCKS = 5
N_PERIODS = 1000
FACTOR_1_COEF = 0.05
FACTOR_2_COEF = 0.05
INTERACTION_COEF = 0.
simulate_and_plot_results(N_STOCKS, N_PERIODS, FACTOR_1_COEF, FACTOR_2_COEF, INTERACTION_COEF)
Note how both factors seem to have an effect on return, but the effect is "additive". There is no change in slope when varying the factor_2_quantile.
N_STOCKS = 5
N_PERIODS = 1000
FACTOR_1_COEF = 0.05
FACTOR_2_COEF = 0.05
INTERACTION_COEF = 0.05
simulate_and_plot_results(N_STOCKS, N_PERIODS, FACTOR_1_COEF, FACTOR_2_COEF, INTERACTION_COEF)
The individual positive effect for both variables and interaction effect is clearly visible.