Notebook

Exploring Market Trends: Seasonality

Fama (1970) introduced efficient market hypothesis (EMH), stating the prices of securities fully reflect available information. Therefore, investors buying securities in an efficient market should expect to obtain an equilibrium rate of return. Later he introduced three different form of market efficiency: weak, semi-strong, and strong. Anomalies are empirical results that seem to be inconsistent with maintained theories of asset-pricing behavior. One class of market anomalies is seasonality and changes in market trend during different seasons. Seasonality in stock returns is a closely related to week-form of market efficiency. There are multiple seasonality effects (aka calendar effects):

  • Weekend effect
  • Monthly effect
  • Holidays effect

Here, we attempt to explore monthly change in stock market returns. The following table demonstrates market return at each month for the past 300 plus years!(Source: SeekingAlpha)

Average Return % 316 years 82 years 50 years 14 years
1693-2009 1929-2011 1961-2011 1999-2012
January 0.69 1.0 1.2 -1.6
February 0.09 0.0 0.0 -1.3
March -0.03 0.4 1.1 2.7
April 0.49 1.4 2 2.7
May 0.02 -0.2 -0.1 -0.8
June -0.12 0.5 -0.6 -1.22
July -0.31 1.5 0.9 0.34
August 0.44 0.8 0.2 0.39
September -0.49 -1.3 -0.8 -1.28
October -0.5 0.0 0.5 1.32
November 0.35 0.8 1.2 1.2
December 0.81 1.5 1.5 2.6
Winter 2.42 5.20 7.20 6.36
Summer -0.96 1.28 0.09 -1.27
Whole Year 1.44 6.55 7.29 5.01
Difference Winter-Summer 3.38 3.92 7.11 7.63

First, we are going to look at SPY calendar anomaly and observe possible trend changes in every month. Data available for this analysis starts from 2012 so we have 16 years of history to study. We are going to calculate monthly return of each year and plot with box plot.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

start = '2002-01-01'
end = '2018-01-01'
Symbol='SPY'
SPY = get_pricing(Symbol, fields='price', start_date=start, end_date=end,frequency='daily')

r = SPY.pct_change()
Monthly_Returns = r.groupby((r.index.year,r.index.month)).sum()
In [2]:
Monthly_Returns_List=[]
for i in range(len(Monthly_Returns)):
    Monthly_Returns_List.append({'Year':Monthly_Returns.index[i][0],'Month':Monthly_Returns.index[i][1],
                                 'Monthly_Return': Monthly_Returns[i] })
Monthly_Returns_List=pd.DataFrame(Monthly_Returns_List,
                                  columns=('Year','Month','Monthly_Return'))


Monthly_Returns_List.boxplot(column='Monthly_Return', by='Month')
ax = plt.gca()
labels = [item.get_text() for item in ax.get_xticklabels()]
labels=['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC']

ax.set_xticklabels(labels)

plt.tick_params(axis='both', which='major', labelsize=15)
plt.show()

Observations:

  • Overall market return trend exhibits a wave like pattern: market returns seem to increase from January to April followed by decrease in average returns in May and June. August to December exhibits steady increase in market returns.
  • Highest range of return (high to low range of box plot) in the past 16+ years occured in January, July, and August.
  • October demonstrates low donwside trade probability as low part of the box plot is closely situated outside the first quantile of the data. Therefore, it might provide good opportunity for investor to get in market or add to their investments. However, proper backtesting is needed to prove this hypothesis.
  • Months of June and Decmber provide the smallest interquartile range (IQR). Coincidently, the afroementioned months are half year and end of the trading year.

Now we will utilize the Augmented Dickey-Fuller Test for stationarity. The null hypothesis states that large p values indicate non-stationarity and smaller p values indicate stationarity. (We will be using 0.05 as selected alpha value.)

In [3]:
from statsmodels.tsa.stattools import adfuller
result = adfuller(SPY)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
    print('\t%s: %.3f' % (key, value))
ADF Statistic: 1.938960
p-value: 0.998588
Critical Values:
	5%: -2.862
	1%: -3.432
	10%: -2.567

As suspected, ADF test revealed the time series data (SPY stock data) are non stationary meaning time series is a stochastic process whose unconditional joint probability distribution changes when shifted in time. Now we know our time series is non staitionary, we attempt to exploit seasonality trends using time series decomposition.

We shall think of the time series $y_t$ as comprising three components: a seasonal component, a trend-cycle component (containing both trend and cycle), and a remainder component (containing anything else in the time series). For example, if we assume an additive model, then we can write: $$y_t=S_t+T_t+E_t$$ where ytyt is the data at period $t$, $S_t$ is the seasonal component at period $t$, $T_t$ is the trend-cycle component at period $t$ and $E_t$ is the remainder (or irregular or error) component at period $t$. Alternatively, a multiplicative model would be written as $$y_t=S_t×T_t×E_t$$

For more detailed information visit here

In this notebook I am using additive decompositional model to extract seasonal model. In order to claculate seasonal effect, I used frequency decomposition of 252 meanining, trend is repeated every 252 days (5 trading days and 9 holidays results in 252 trading day).

In [4]:
import statsmodels.api
import statsmodels as sm

res = sm.tsa.seasonal.seasonal_decompose(SPY,model='additive', freq=252)
resplot = res.plot()
/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/filters/filtertools.py:28: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return np.r_[[np.nan] * head, x, [np.nan] * tail]

As it can be seen from the above chart, overall trend of the market was increasing with the downward trend in 2008-2009. The seasonal trend repeating every year indicate vally and peak. Now, we attempt to isolate this trend and inspect it further:

In [5]:
SPY_Seasonality=res.seasonal[res.seasonal.index.year == 2017]
MA1=SPY_Seasonality.rolling(window=20).mean()
plt.plot(SPY_Seasonality, label='SPY Seasonality')
plt.plot(MA1, label='20D MA')
plt.title("Seasonality of SPY", fontsize=18, fontweight='bold')
plt.legend()
plt.show()

Observations:

  • As it can be seen from the above seasonality chart, the seasonality trend is like a wave with major high happening around May and major lows happening around October.
  • Market expands from beginning of the year to Spring, then it follows by a period of contraction to mid fall.
  • The seasonality trend reveals change in expectation of market performance during the year.

Now we isolated seasonality trend we can, adjust market trend in regard to seasonality. Since we used additive model to identify seasonality, we can simply deduct seasonality from stock data (SPY) and adjust for seasonality. In periods where market has strong bull or bear (upward or downward) trends, seasonality effects might be too weak to observe. However, if market exhibits range bound behavior it, such effect can be more evident. Here, we will look at 2017 data (strong bull market) and 2015 data (sideways behavior)

In [6]:
plt.plot(SPY[SPY.index.year == 2017]+SPY_Seasonality,color='red',label='Seasonality Adjusted SPY');
plt.plot(SPY[SPY.index.year == 2017],label='SPY')
plt.legend(loc=4)
plt.show()
In [7]:
plt.plot(SPY[SPY.index.year == 2015]+res.seasonal[res.seasonal.index.year == 2015],color='red',label='Seasonality Adjusted SPY');
plt.plot(SPY[SPY.index.year == 2015],label='SPY')
plt.legend(loc=4)
plt.show()

Now we can simply test the return of a simple strategy based on seasonality. We are going to buy SPY in October and sell the entire holdings in May or simply comparing return of SPY from October of each year to May of next year.

In [8]:
import datetime
Min_Month=SPY_Seasonality.idxmin().month
Max_Month=SPY_Seasonality.idxmax().month

Index=SPY.index
Years = list(set(Index.year))
Years.sort()

Returns = d = {'Market Return' : pd.Series(index=Years),'Strategy Return' : pd.Series( index=Years)}
Returns = pd.DataFrame(Returns)

for index, row in Returns.iterrows():
    Data=SPY[SPY.index.year == index]
    row['Market Return']=((Data.iloc[-1]-Data.iloc[0])*100/Data.iloc[0])
    if index<Years[-1]:
        # purchase at first day of the min month (this year)
        Purchase=SPY.loc[(SPY.index.year==index) & (SPY.index.month==Min_Month)][0]
        # sell at the first day of max month (next year)
        Sell=SPY.loc[(SPY.index.year==index+1) & (SPY.index.month==Max_Month)][0]
        row['Strategy Return']=(Sell-Purchase)*100/Purchase

ax = Returns[['Market Return','Strategy Return']].plot(kind='bar', title ="Seasonality Strategy vs Market Return", figsize=(15, 10), legend=True, fontsize=14)
ax.set_xlabel("Year", fontsize=12)
ax.set_ylabel("% Returns", fontsize=12)
plt.show()
In [9]:
Returns.hist();

Looking at returns of our strategy we can see; this strategy is not as profitable as simply buying and holding for entire year. During market crash of 2008, this strategy lost less money than average market return. However, this might be due to start month of market crash (further analysis is required). One should note that this strategy is extremely simple without considering proper entry and exit point within each month. Therefore, in order to fully understand the profitability and risk of this strategy, in depth analysis is required which is outside the scope of this article.

Now we will analyze different market sectors and their respective seasonality and measure the correlation to SPY seasonality:

In [10]:
Sectors={'XLB','XLY','XLF','IYR','XLP','XLV','XLU','IYZ','XLE','XLI','XLK'}
Sectors_Data = get_pricing(Sectors, fields='price', start_date=start, end_date=end,frequency='daily')
In [11]:
df_ = pd.DataFrame(index=SPY_Seasonality.index, columns=Sectors)
for i in range(len(Sectors)):
    Sectors_res = sm.tsa.seasonal.seasonal_decompose( Sectors_Data[[i]],model='additive', freq=252)
    Seasonality=Sectors_res.seasonal[Sectors_res.seasonal.index.year == 2017]
    df_[list(Sectors)[i]]=Seasonality
In [12]:
import scipy.stats as ss

Summary=[]
Legend={'XLB','XLY','XLF','IYR','XLP','XLV','XLU','IYZ','XLE','XLI','XLK'}
NUM_COLORS = len(Sectors)

cm = plt.get_cmap('gist_ncar')
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_color_cycle([cm(1.*i/NUM_COLORS) for i in range(NUM_COLORS)])

for i in range(NUM_COLORS):
    ax.plot(df_[[i]].rolling(window=20).mean())
    Summary.append({'Sector': list(df_[[i]].columns)[0], 'Min': np.min(df_[[i]])[0], 'Max': np.max(df_[[i]])[0],\
                   'Range':np.max(df_[[i]])[0]-np.min(df_[[i]])[0],'Std':np.std(df_[[i]])[0], 'Corr to SPY Seasonality': ss.spearmanr(SPY_Seasonality, df_[[i]])[0] })

ax.plot(SPY_Seasonality.rolling(window=20).mean(), color='black', linestyle='-',marker='*')
plt.legend(Legend)
plt.show()
Summary=pd.DataFrame(Summary, columns=('Sector','Min','Max','Range','Std','Corr to SPY Seasonality'))
/usr/local/lib/python2.7/dist-packages/matplotlib/cbook.py:137: MatplotlibDeprecationWarning: The set_color_cycle attribute was deprecated in version 1.5. Use set_prop_cycle instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

SPY seasonality in the above picture overlaied with black line.

I used Spearman Rank Correlation to identify correlation of seasonality of different market sectors to seasonality of market itself. As it can be seen from the tables below, consumer discretionary has highest correlation to market seasonality and real estate sector has the least seasonality correlation to market.

In [13]:
Summary.sort_values('Corr to SPY Seasonality', inplace=True)
Summary
Out[13]:
Sector Min Max Range Std Corr to SPY Seasonality
6 IYR -1.929332 1.369324 3.298656 0.696192 0.645698
10 IYZ -0.539593 0.467019 1.006612 0.261734 0.717054
0 XLK -0.480028 0.391183 0.871211 0.192528 0.724222
8 XLU -0.746300 0.563885 1.310185 0.293114 0.736387
7 XLV -1.068718 0.632143 1.700862 0.335469 0.743066
3 XLE -1.886580 1.867639 3.754219 0.898096 0.780286
2 XLF -0.348368 0.403559 0.751927 0.147102 0.819879
9 XLP -0.455054 0.469881 0.924935 0.232815 0.921289
4 XLB -1.178544 1.080118 2.258661 0.509609 0.925252
1 XLI -1.115932 0.960775 2.076707 0.436234 0.933973
5 XLY -0.987193 1.067920 2.055112 0.439450 0.950799

The cumulative histogram of seasonality of selected market sectors are listed below.

In [14]:
from scipy import stats
plt.close(fig)
fig = plt.figure()
ax = fig.add_subplot(111)

stats.probplot(SPY_Seasonality, plot=ax);
stats.probplot(df_['XLE'], plot=ax);
stats.probplot(df_['XLK'], plot=ax);
stats.probplot(df_['XLY'], plot=ax);
stats.probplot(df_['IYR'], plot=ax);


ax.get_lines()[0].set_markerfacecolor('red')
ax.get_lines()[2].set_markerfacecolor('green')
ax.get_lines()[4].set_markerfacecolor('yellow')
ax.get_lines()[6].set_markerfacecolor('black')
ax.get_lines()[8].set_markerfacecolor('darkcyan')

ax.get_lines()[1].remove()
ax.get_lines()[2].remove()
ax.get_lines()[3].remove()
ax.get_lines()[4].remove()
ax.get_lines()[5].remove()
ax.legend({'SPY','XLE','XLK','XLY','IYR'},loc=4);

We can use empirical time series model to identify the effect of each month. The model is defined as below: $$ R=\beta+\alpha*{jan} \times r*{jan}+\alpha*{feb} \times r*{feb}+...\alpha*{nov} \times r*{nov}$$

All we need to do is to find returns of each month then use regression analysis to find the coefficients. $\beta$ in this case can be considered as December return

In [15]:
start = '2002-01-01'
end = '2018-01-01'
Symbol='SPY'
SPY = get_pricing(Symbol, fields='price', start_date=start, end_date=end,frequency='daily')

r = SPY.pct_change()


g=r.groupby((r.index.year,r.index.month)).sum()
Monthly_Data=g.unstack()
Yearly_Data=r.groupby((r.index.year)).sum()
labels=['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC']
Monthly_Data.columns=labels
In [16]:
np.shape(Yearly_Data)
Out[16]:
(16,)
In [17]:
np.shape(Monthly_Data)
Out[17]:
(16, 12)
In [18]:
import statsmodels.api as sm

X = Monthly_Data[['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV']]
    
X = sm.add_constant(X)

model11 = sm.OLS(Yearly_Data, X).fit()
In [19]:
model11.summary()
/usr/local/lib/python2.7/dist-packages/scipy/stats/stats.py:1535: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=16
  "anyway, n=%i" % int(n))
Out[19]:
<caption>OLS Regression Results</caption>
Dep. Variable: Equity(8554 [SPY]) R-squared: 0.993
Model: OLS Adj. R-squared: 0.975
Method: Least Squares F-statistic: 55.21
Date: Fri, 06 Jul 2018 Prob (F-statistic): 0.000750
Time: 15:03:38 Log-Likelihood: 45.850
No. Observations: 16 AIC: -67.70
Df Residuals: 4 BIC: -58.43
Df Model: 11
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
const -0.0152 0.024 -0.635 0.560 -0.082 0.051
JAN 1.0550 0.284 3.713 0.021 0.266 1.844
FEB 1.5913 0.578 2.753 0.051 -0.013 3.196
MAR 1.0025 0.452 2.218 0.091 -0.252 2.257
APR 2.2060 0.648 3.403 0.027 0.406 4.006
MAY 0.7309 0.332 2.202 0.092 -0.191 1.652
JUN 0.6836 0.495 1.380 0.240 -0.692 2.059
JUL 0.4306 0.622 0.692 0.527 -1.296 2.158
AUG 0.5346 0.529 1.011 0.369 -0.934 2.003
SEP 1.3605 0.276 4.922 0.008 0.593 2.128
OCT 0.5622 0.303 1.858 0.137 -0.278 1.402
NOV 1.9634 0.719 2.730 0.052 -0.033 3.960
Omnibus: 1.546 Durbin-Watson: 2.152
Prob(Omnibus): 0.462 Jarque-Bera (JB): 0.652
Skew: 0.494 Prob(JB): 0.722
Kurtosis: 3.052 Cond. No. 182.
In [20]:
model11.params.plot(kind='bar', title ="Coeffcients of Regressio Analysis", figsize=(15, 10), legend=True, fontsize=14);

Looking at the model parameters we can see, month of April/November has the highest coefficient while July has the least coefficient. This shows possibility of higher return in months with higher coefficient and vice versa. This outcome is consistent with our analysis on average monthly return of the market bar chart presented at the beginning of this article.

It is important to note during this study; the remainder component of seasonality has not been studied because it requires more in-depth statistical analysis which is outside the scope of this study. Now let's see how market reacted in 2018 so far:

In [21]:
start = '2018-01-01'
end = '2018-07-01'
Symbol='SPY'
SPY_2018 = get_pricing(Symbol, fields='price', start_date=start, end_date=end,frequency='daily')

r_2018 = SPY_2018.pct_change()
Monthly_Returns_2018 = r_2018.groupby((r_2018.index.year,r_2018.index.month)).sum()
Monthly_Returns_2018.plot(kind='bar', title ="Seasonality Strategy vs Market Return", figsize=(15, 10), legend=True, fontsize=14)
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe58562fd50>