This research is published in partnership with Quantpedia, an online resource for discovering new trading ideas.
You can view the full Quantpedia series in the library along with other research and strategies.
Whitepaper authors: Shahram Amini, Vijay Singal
Whitepaper source: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2589966
From Quantpedia:
> The paper states that it is generally accepted that managers have more information about the firm than investors. Given this information asymmetry, managers can make informed decisions about corporate actions such as equity offerings or repurchases. The announcement of stock repurchase or secondary equity offering is voluntary and can be easily moved by a few weeks or months. Therefore timing of SEO or repurchase announcement before earnings announcement could be perceived as important information about future performance of stock during earnings announcement period.
While equity issues are not yet available through Quantopian, I find evidence of earnings predictability similar to the authors with positive returns of 1.11% over a 25-day window (-10, +15) for earnings following a buyback announcement. The results hold true for different time windows (0, +15) and sample selection criteria. However, unlike the authors, I find the highest positive returns for earnings that are (5, 15) days after a buyback announcement. Lastly, because the data used in this notebook is up till present day, it may be useful to view this as an "out-of-sample" validation of the authors' original work.
Most financial analysts will agree that managers have informational advantage about their company and this advantage helps them to schedule corporate actions in order to maximize value for shareholders. The authors, Shahram Amini and Vijay Singal, propose that one way managers execute on their informational edge is by timing corporate actions like share buybacks and issues close to the date of an earnings announcement.
>Assuming no predictability, the average market reaction to earnings announcements should not be significantly different from zero.
The authors show that these strategically timed corporate actions result in earnings that still surprise the market. My research finds similar results with a positive raw returns of 1.115% in a (-10, +15) day window for buyback announcements. This earnings predictability is consistent through robustness tests for regulated firms, small cap securities, different return time windows, and from 2010 ~ 2016.
>In contrast to their study, we believe that earnings predictability, if any, must occur in the earnings announcement immediately following equity issues or buyback announcements where the superiority of managerial information is likely to be more evident.
So while share issues weren't analyzed in this study, the authors' research as well as my own suggest positive indicators for earnings predictability following buyback announcements. Using a sample size of 5,900 buybacks from 2007 till 2016, I find an average 1.00% positive return over a (-10, +15) day window with average returns increasing to ~2.00% from 2010 ~ 2016 for earnings reports that have occured 16 ~ 30 days after a buyback announcement. And like the authors, I find this earnings predictability surprising for a number of reasons: First, the positive returns around earnings after a buyback doesn't depend on market reactions to earnings (e.g. earnings surprise, PEAD). Second, if this predictability is significant, it means that the market is not efficiently pricing out the buyback announcement and that effect seems compounded on the earnings announcement.
You can navigate the rest of this notebook as follows:
# Premium Versions
from quantopian.interactive.data.eventvestor import buyback_auth
from quantopian.interactive.data.eventvestor import earnings_calendar as ev_earnings
# Sample Versions
# from quantopian.interactive.data.eventvestor import buyback_auth_free as buyback_auth
# from quantopian.interactive.data.eventvestor import earnings_calendar_free as ev_surprise
fundamentals = init_fundamentals()
# Aggregating the data into yearly slices as well as one full aggregate DataFrame
yearly_buyback_data, all_buyback_data, \
yearly_earnings_data, all_earnings_data = get_yearly_and_all_buyback_earnings_data()
The methodology behind the study is based on the idea that (1) if a buyback is announced today, the stock's price is expected to increase and (2) if the stock's price increases on the following earnings announcement, it would provide support for market inefficiency surrounding the corporate action.
The data used in this Research Notebook is sourced from EventVestor's Buyback Authorizations and Earnings Calendar Dataset. The sample version is available from 2007 up till 2014 while the premium version is available up till present day.
for i, year in enumerate(yearly_buyback_data):
df = yearly_buyback_data[year].dropna()
data = {year: {'N': df.asof_date.count(), 'Median Buybacks as % of SO': df['Percent of SO'].median(),
'Median Total Value (Mill)': df['Total Value (Mill)'].median(),
'Median Market Cap (Mill)': df['Market Cap (Mill)'].median()}}
if i == 0:
stats_df = pd.DataFrame(data)
total_df = df
else:
stats_df[year] = pd.DataFrame(data)
total_df = total_df.append(df)
total_stats = {'total':{'N': total_df.asof_date.count(),
'Median Buybacks as % of SO': total_df['Percent of SO'].median(),
'Median Total Value (Mill)': total_df['Total Value (Mill)'].median(),
'Median Market Cap (Mill)': total_df['Market Cap (Mill)'].median()}}
stats_df = stats_df.T
stats_df.index = stats_df.index.order()
stats_df.hist()
stats_df = stats_df.T
stats_df['total'] = pd.DataFrame(total_stats)
stats_df.T
Overall, there were 5,900 buybacks studied between 2007 ~ 2016. Firms announced a median buyback of 6.03% during that time as opposed to the 5.34% found in the paper.
Next I look at the distribution of buybacks around earnings announcements.
The authors use a window of +/- 30 trading days so to replicate a similar date window, we use a range of 30+/- trading days on each side
bins = range(-30, 35, 5)
days_away = get_days_away(all_buyback_data, all_earnings_data)
days_away = days_away[abs(days_away) <= 30]
ax = days_away.hist(bins=bins, color='r', alpha=.6, label="All Buybacks")
all_buyback_data_filtered = all_buyback_data[all_buyback_data["Percent of SO"] > .05]
days_away = get_days_away(all_buyback_data_filtered, all_earnings_data)
days_away = days_away[abs(days_away) <= 30]
days_away.hist(bins=bins, color='g', alpha=.6, label='Buybacks > 5%')
plt.title("Distribution of Buybacks around Earnings Announcements")
plt.ylabel("Number of Buybacks Announced")
plt.xlabel("Number of Days")
plt.legend()
A majority of announcements occur after the earnings announcement for all buybacks announcements. As shown in the Returns section, this often happens after the stock's price performs abnormaly poorly after an earnings report.
For the rest of this study, only buybacks > 5% are looked at. This is (1) to only include important corporate actions and (2) use all buybacks as a criteria in the Robustness test section.
This next section is a closer look at buybacks > 5%.
"""
Filter out for only buybacks that are > 5%
"""
bins = range(-30, 31, 5)
all_buyback_data_filtered = all_buyback_data[all_buyback_data["Percent of SO"] > .05]
days_away = get_days_away(all_buyback_data_filtered, all_earnings_data)
days_away = days_away[abs(days_away) <= 30]
ax = days_away.hist(bins=bins, color='g', alpha=.6, label='Buybacks > 5%')
plt.title("Distribution of Buybacks around Earnings Announcements")
plt.ylabel("Number of Buybacks Announced")
plt.xlabel("Number of Days")
plt.legend()
"""
Label each buyback as percentage sample count of whole
"""
rects = ax.patches
labels = ["label%d" % i for i in xrange(len(rects))]
cut_bins = np.linspace(days_away.min(), days_away.max(), len(bins))
groups = days_away.groupby(pd.cut(days_away, bins, right=False))
x = groups.count()/days_away.count()
for i, rect in enumerate(rects):
height = rect.get_height()
label = "%0.2f%%" % (100*x.iloc[i])
ax.text(rect.get_x() + rect.get_width()/2, height + 5, label, ha='center', va='bottom')
"""
Print a few summary statistics about the buyback auth dataset
"""
data_freq = pd.DataFrame(groups.mean())
data_freq = data_freq.rename(columns={0: "Mean"})
data_freq['Count'] = groups.count()
data_freq['Week'] = (data_freq['Mean']/7).apply(lambda x: int(x))
data_freq['Percentage'] = data_freq['Count']/data_freq['Count'].sum()*100
data_freq.index.name = 'Interval'
data_freq['Cumulative %'] = data_freq.Percentage.cumsum()
data_freq = data_freq.drop('Mean', axis=1)
data_freq.name = "Buybacks"
print "Buybacks Distribution"
data_freq
So it seems that the bulk of buybacks tend to occur after an earnings announcement. However, about 22% of announcements are still made in the 30 day window prior to an earnings announcement.
On both a long (-10, +15) and short (-1, 1) time window, I find positive raw returns similar to what the authors have documented (1.15% and .0046%) looking at firms with earnings that have been announced 16 ~ 30 days after a buyback.
Abnormal returns are much lower for this specific time window at ~0%. One explanation for this difference is that this study looks at a much more recent date range than the original study and the time window for alpha following a corporate action may have shortened since the study because when looking at earnings announcements (5, 15) days after an earnings versus (16, 30) days, the abnormal returns improves significantly.
This initial study shows the returns looking at buybacks greater than 5% but as you'll see in the Robustness section, the returns increase when that criteria is expanded to include all buybacks and different time windows. Specifically, the (5, 15) day criteria for earnings had the highest returns.
The figures below show the cumulative and abnormal returns (assuming risk-free = 0) for both buyback and earnings announcements. Although the focus of this research is earnings, buybacks are shown to highlight the evidence for positive abnormal returns around this corporate aciton.
# Find different earnings announcement dependent on length away from buybacks
days_to_study = {
'16 to 31 days after': [16, 31],
'16 to 31 days before': [-16, -31]
}
earnings_study_results = {}
buybacks_study_results = {}
data_for_earnings = {}
data_for_buybacks = {}
for name, days in days_to_study.iteritems():
earnings, buybacks = find_earnings_announcements(all_buyback_data_filtered,
all_earnings_data,
days=days)
data_for_earnings[name] = earnings
data_for_buybacks[name] = buybacks
for name, earnings in data_for_earnings.iteritems():
print "------------------------------------------------------------------"
print "Day 0: Earnings with buyback announcements %s earnings" % name
results = custom_event_study(earnings, end_date='2016-06-01', use_liquid_stocks=False,
days_before=10, days_after=15)
earnings_study_results[name] = results
plt.show()
for name, buybacks in data_for_buybacks.iteritems():
print "------------------------------------------------------------------"
print "Day 0: Buybacks with buyback announcements %s earnings" % name
results = custom_event_study(buybacks, end_date='2016-06-01', use_liquid_stocks=False,
days_before=10, days_after=15)
buybacks_study_results[name] = results
plt.show()
The Day 0: Earnings with buyback announcements 16 to 31 days before earnings
event study plots show a significant upwards movement in stock price from before the earnings announcement up till 14 days after.
It's interesting to note that when buybacks happened to 16 to 31 days after the earnings announcement, there seems to be a reversal in the stock price which suggests that buybacks are likely announced after a poor market reaction to earnings.
The table below highlights the abnormal and raw returns for earnings announcements 16 ~ 31 days before/after buyback announcements.
return_intervals = [(0, 1), (-1, 1), (0, 5), (0, 15), (-5, 5), (-10, 15)]
print "Day 0: Earnings Announcement Date"
print "----------------------------------------------------------"
print "Returns of Repurchasing Firms assuming risk-free rate is 0"
print "----------------------------------------------------------"
get_returns_table(earnings_study_results, return_intervals)
So you can see that for buybacks that happened after an earnings announcement, the market's reaction tended to be negative leading up to the buyback and vice versa for buybacks that happened prior.
Up till now, my research supports that there is evidence of earnings predictability following stock buyback announcements. This section is dedicated to running my methodology through a series of robustness tests by altering the sample slices of data passed in as inputs.
The first three tests are taken from the original paper, the last two have been included for this notebook:
>1. Regulated firms account for a significant fraction of our sample and were not excluded from the sample used to estimate the main results. Financial firms (SIC codes 6000–6999) may issue equity or repurchase their own stock to meet capital requirements and regulations.
4 . 2007 ~ 2009 were extremely volatile years for the financial markets. In this robustness test, I consider years 2010 ~ 2016 exclusively.<br> 5 . The main study was conducted on buybacks > 5%. This test expands that to include all levels of buybacks.<br> 6 . This final test looks at buybacks from a window range of 1 ~ 30 days before an earnings announcement. I include this in order to maximize the sample size of buyback announcements to increase the frequency of trades (as event-based strategies tend to be on the lower end of turnovers).
Each of these tests will be presented as individual plots documenting before/after earnings announcements similar to the Results section and a summary table is presented towards the end.
robustness_tests = {}
"""
Exclude Financial Firms with SIC codes 6000 - 6999
"""
test_type = "Excluding firms with SIC codes 6000 - 6999"
print "------------------------------------------------------------------"
print test_type
earnings_study_results = {}
data_for_rb_tests = data_for_earnings.copy()
for name, earnings in data_for_rb_tests.iteritems():
earnings_sic = earnings.apply(lambda row: insert_sic_code(row), axis=1)
earnings_sic = earnings_sic[(earnings_sic['sic'] < 6000) | (earnings_sic['sic'] > 6999)]
print "------------------------------------------------------------------"
print "Day 0: Earnings with buyback announcements %s earnings" % name
results = custom_event_study(earnings_sic, end_date='2016-06-01', use_liquid_stocks=False,
days_before=10, days_after=15)
earnings_study_results[name] = results
plt.show()
robustness_tests[test_type] = get_returns_table(earnings_study_results, return_intervals)
"""
Excluding small cap or illiquid securities
"""
test_type = "Excluding small cap and illiquid securities"
print "------------------------------------------------------------------"
print "------------------------------------------------------------------"
print test_type
earnings_study_results = {}
data_for_rb_tests = data_for_earnings.copy()
for name, earnings in data_for_rb_tests.iteritems():
print "--------"
print "Day 0: Earnings with buyback announcements %s earnings" % name
results = custom_event_study(earnings, end_date='2016-06-01', use_liquid_stocks=True,
days_before=10, days_after=15, top_liquid=1000)
earnings_study_results[name] = results
plt.show()
robustness_tests[test_type] = get_returns_table(earnings_study_results, return_intervals)
"""
Look at buybacks 5 ~ 15 versus 16 ~ 30
"""
days_to_study_two = {
'5 to 15 days after': [5, 15],
'5 to 15 days before': [-5, -15]
}
data_for_earnings_two = {}
earnings_study_results = {}
for name, days in days_to_study_two.iteritems():
earnings, buybacks = find_earnings_announcements(all_buyback_data_filtered,
all_earnings_data,
days=days)
data_for_earnings_two[name] = earnings
test_type = "Looking at a 5 ~ 15 day time window"
print "------------------------------------------------------------------"
print "------------------------------------------------------------------"
print test_type
earnings_study_results[name] = results
for name, earnings in data_for_earnings_two.iteritems():
print "------------------------------------------------------------------"
print "Day 0: Earnings with buyback announcements %s earnings" % name
results = custom_event_study(earnings, end_date='2016-06-01', use_liquid_stocks=False,
days_before=10, days_after=15)
earnings_study_results[name] = results
plt.show()
robustness_tests[test_type] = get_returns_table(earnings_study_results, return_intervals)
"""
Excluding 2007 ~ 2009
"""
test_type = "Excluding 2007 ~ 2009"
print "------------------------------------------------------------------"
print "------------------------------------------------------------------"
print test_type
earnings_study_results = {}
data_for_rb_tests = data_for_earnings.copy()
for name, earnings in data_for_rb_tests.iteritems():
print "--------"
print "Day 0: Earnings with buyback announcements %s earnings" % name
results = custom_event_study(earnings, end_date='2016-06-01', start_date='2010-01-01', use_liquid_stocks=False,
days_before=10, days_after=15)
earnings_study_results[name] = results
plt.show()
robustness_tests[test_type] = get_returns_table(earnings_study_results, return_intervals)
"""
Including all buybacks, not just buybacks > 5%
"""
test_type = "All buyback announcements"
print "------------------------------------------------------------------"
print "------------------------------------------------------------------"
print test_type
data_for_earnings_three = {}
for name, days in days_to_study.iteritems():
earnings, buybacks = find_earnings_announcements(all_buyback_data,
all_earnings_data,
days=days)
data_for_earnings_three[name] = earnings
earnings_study_results = {}
data_for_rb_tests = data_for_earnings_three
for name, earnings in data_for_rb_tests.iteritems():
print "--------"
print "Day 0: Earnings with buyback announcements %s earnings" % name
results = custom_event_study(earnings, end_date='2016-06-01', start_date='2010-01-01', use_liquid_stocks=False,
days_before=10, days_after=15)
earnings_study_results[name] = results
plt.show()
robustness_tests[test_type] = get_returns_table(earnings_study_results, return_intervals)
"""
Look at buybacks 1 ~ 30
"""
days_to_study_four = {
'1 to 30 days after': [1, 31],
'1 to 30 days before': [-1, -31]
}
data_for_earnings_four = {}
earnings_study_results = {}
for name, days in days_to_study_four.iteritems():
earnings, buybacks = find_earnings_announcements(all_buyback_data_filtered,
all_earnings_data,
days=days)
data_for_earnings_four[name] = earnings
test_type = "Looking at buybacks 1 ~ 30 days before/after"
print "------------------------------------------------------------------"
print "------------------------------------------------------------------"
print test_type
earnings_study_results[name] = results
for name, earnings in data_for_earnings_four.iteritems():
print "------------------------------------------------------------------"
print "Day 0: Earnings with buyback announcements %s earnings" % name
results = custom_event_study(earnings, end_date='2016-06-01', use_liquid_stocks=False,
days_before=10, days_after=15)
earnings_study_results[name] = results
plt.show()
robustness_tests[test_type] = get_returns_table(earnings_study_results, return_intervals)
pd.concat(dict([(key, df.T) for key, df in robustness_tests.iteritems()]), axis=1)
The robustness tests show positive evidence for earnings predictabillity, especially when looking at returns over a longer time window (-10, +15).
5 to 15 days
test appears to have the most abnormal returns with 2.67% (-10, +15), however that sample size is also the smallest (N=138).The core framework of the strategy is provided by Quantpedia and vetted by the study:
>The investment univese consists of stocks from NYSE/AMEX/Nasdaq (no ADRs, CEFs or REITs), bottom 25% of firms by market cap are dropped. Each quarter, investor looks for companies which announce stock repurchase program (with announced buyback for at least 5% of outstanding stocks) during days -30 to -15 before earnings announcement date for each company. Investor goes long stocks with announced buybacks during days -10 to +15 around earnings announcement. Portfolio is equally weighted and rebalanced daily.
Using the core idea above and experience from prior research and my robustness tests, these are a few ways to start building the strategy:
Base Universe based off Nathan Wolfe's framework:
Strategy Ideas:
The best performing strategy will be posted along with the notebook, but if you'd like to see the other strategies, please email me at SLEE @ Quantopian.com
>Assuming no predictability, the average market reaction to earnings announcements should not be significantly different from zero.
The results in the authors' study raise interesting questions about market efficiency surrounding corporate actions. Earnings are a highly studied event with much of the alpha in traditional earnings strategies squeezed out. However, the research here suggests that there is some level of predictability surrounding earnings and corporate actions: the (-10, 15) day abnormal returns for earnings after buybacks > 5% are ~0% which increases to ~2% when looking at earnings with buybacks at any point in a (-15, -5) window.
>From the difference in market reactions, we can conclude that the market adjustment to corporate actions is incomplete, and can result in predictability of earnings announcements.
That being said there are lots of ways to expand this research including the authors' suggestions (found in the paper). I have plans to include equity issues (like the original paper), 13D Filings, and other corporate actions to study the more general effects of market efficiency surrounding corporate actions.
Finally and as always, thanks for reading. I'd love to hear your feedback about this research and ways you think we can improve it. Feel free to reach out to me at slee @ quantopian.com.
References:
from __future__ import division
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from datetime import timedelta
from odo import odo
import scipy
import math
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.classifiers.morningstar import Sector
from quantopian.pipeline.data import morningstar as mstar
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import CustomFactor, AverageDollarVolume, SimpleMovingAverage
from quantopian.pipeline.filters.morningstar import IsPrimaryShare
def filter_universe(min_price = 0., min_volume = 0.):
"""
Computes a security universe based on nine different filters:
1. The security is common stock
2 & 3. It is not limited partnership - name and database check
4. The database has fundamental data on this stock
5. Not over the counter
6. Not when issued
7. Not depository receipts
8. Is Primary share
9. Has high dollar volume
Returns
-------
high_volume_tradable - zipline.pipeline.factor.Rank
A ranked AverageDollarVolume factor that's filtered on the nine criteria
"""
common_stock = mstar.share_class_reference.security_type.latest.eq('ST00000001')
not_lp_name = ~mstar.company_reference.standard_name.latest.matches('.* L[\\. ]?P\.?$')
not_lp_balance_sheet = mstar.balance_sheet.limited_partnership.latest.isnull()
have_data = mstar.valuation.market_cap.latest.notnull()
not_otc = ~mstar.share_class_reference.exchange_id.latest.startswith('OTC')
not_wi = ~mstar.share_class_reference.symbol.latest.endswith('.WI')
not_depository = ~mstar.share_class_reference.is_depositary_receipt.latest
primary_share = IsPrimaryShare()
# Combine the above filters.
tradable_filter = (common_stock & not_lp_name & not_lp_balance_sheet &
have_data & not_otc & not_wi & not_depository & primary_share)
price = SimpleMovingAverage(inputs=[USEquityPricing.close],
window_length=252, mask=tradable_filter)
volume = SimpleMovingAverage(inputs=[USEquityPricing.volume],
window_length=252, mask=tradable_filter)
full_filter = tradable_filter & (price >= min_price) & (volume >= min_volume)
high_volume_tradable = AverageDollarVolume(
window_length=252,
mask=full_filter
).rank(ascending=False)
return high_volume_tradable
class SidFactor(CustomFactor):
"""
Workaround to screen by sids in pipeline
Credit: Luca
"""
inputs = []
window_length = 1
sids = []
def compute(self, today, assets, out):
out[:] = np.in1d(assets, self.sids)
def get_liquid_universe_of_stocks(start_date, end_date, top_liquid=500):
"""
Gets the top X number of securities based on the criteria defined in
`filter_universe`
Parameters
----------
start_date : string or pd.datetime
Starting date for universe computation.
end_date : string or pd.datetime
End date for universe computation.
top_liquid : int, optional
Limit universe to the top N most liquid names in time period.
Based on 21 day AverageDollarVolume
Returns
-------
security_universe : list
List of securities that match the universe criteria
"""
pipe = Pipeline()
pipe.add(AverageDollarVolume(window_length=1), 'liquidity')
pipe.set_screen((filter_universe() < top_liquid))
data = run_pipeline(pipe, start_date=start_date, end_date=end_date)
security_universe = data.index.levels[1].unique().tolist()
return security_universe
def get_cum_returns(prices, sid, date, days_before, days_after, benchmark_sid):
"""
Calculates cumulative and abnormal returns for the sid & benchmark
Parameters
----------
prices : pd.DataFrame
Pricing history DataFrame obtained from `get_pricing`. Index should
be the datetime index and sids should be columns.
sid : int or zipline.assets._assets.Equity object
Security that returns are being calculated for.
date : datetime object
Date that will be used as t=0 for cumulative return calcuations. All
returns will be calculated around this date.
days_before, days_after : int
Days before/after to be used to calculate returns for.
benchmark : int or zipline.assets._assets.Equity object
Returns
-------
sid_returns : pd.Series
Cumulative returns time series from days_before ~ days_after from date
for sid
benchmark_returns : pd.Series
Cumulative returns time series for benchmark sid
abnormal_returns : pd.Series
Abnomral cumulative returns time series for sid compared against benchmark
"""
day_zero_index = prices.index.searchsorted(date)
starting_index = max(day_zero_index - days_before, 0)
ending_index = min(day_zero_index + days_after + 1, len(prices.index) - 1)
if starting_index < 0 or ending_index >= len(prices.index):
return None
if sid == benchmark_sid:
temp_price = prices.iloc[starting_index:ending_index,:].loc[:,[sid]]
else:
temp_price = prices.iloc[starting_index:ending_index,:].loc[:,[sid, benchmark_sid]]
beta = calc_beta(sid, benchmark_sid, temp_price)
if beta is None:
return
daily_ret = temp_price.pct_change().fillna(0)
daily_ret['abnormal_returns'] = daily_ret[sid] - beta*daily_ret[benchmark_sid]
cum_returns = (daily_ret + 1).cumprod() - 1
try:
# If there's not enough data for event study,
# return None
cum_returns.index = range(starting_index - day_zero_index,
ending_index - day_zero_index)
except:
return None
sid_returns = cum_returns[sid] - cum_returns[sid].ix[0]
bench_returns = cum_returns[benchmark_sid] - cum_returns[benchmark_sid].ix[0]
abnormal_returns = cum_returns['abnormal_returns'] - cum_returns['abnormal_returns'].ix[0]
return sid_returns, bench_returns, abnormal_returns
def calc_beta(sid, benchmark, price_history):
"""
Calculate beta amounts for each security
Parameters
----------
sid : int or zipline.assets._assets.Equity object
Security that beta is being calculated for.
benchmark : int or zipline.assets._assets.Equity object
Benchmark that will be used to determine beta against
price_history: pd.DataFrame
DataFrame that contains pricing history for benchmark and
sid. Index is a datetimeindex and columns are sids. Should
already be truncated for date_window used to calculate beta.
Returns
-------
beta : float
Beta of security against benchmark calculated over the time
window contained in price_history
"""
if sid == benchmark:
return 1.0
stock_prices = price_history[sid].pct_change().dropna()
bench_prices = price_history[benchmark].pct_change().dropna()
aligned_prices = bench_prices.align(stock_prices,join='inner')
bench_prices = aligned_prices[0]
stock_prices = aligned_prices[1]
bench_prices = np.array( bench_prices.values )
stock_prices = np.array( stock_prices.values )
bench_prices = np.reshape(bench_prices,len(bench_prices))
stock_prices = np.reshape(stock_prices,len(stock_prices))
if len(stock_prices) == 0:
return None
regr_results = scipy.stats.linregress(y=stock_prices, x=bench_prices)
beta = regr_results[0]
p_value = regr_results[3]
if p_value > 0.05:
beta = 0.
return beta
def define_xticks(days_before, days_after):
"""
Defines a neat xtick label axis on multipes of 2 using X days before
and X days after.
Parameters
----------
days_before : int
Positive integer detailing the numbers of days before event date
days_after : int
Postiive integer detailing the number of days after an event date
Returns
-------
list : List of integers on multiples of 2 from [-days_before ~ days_after)
"""
day_numbers = [i for i in range(-days_before+1, days_after)]
xticks = [d for d in day_numbers if d%2 == 0]
return xticks
def plot_distribution_of_events(event_data, date_column, start_date, end_date):
"""
Plots the distribution of events
Parameters
----------
event_data : pd.DataFrame
DataFrame that contains the events data with date and sid columns as
a minimum. See interactive tutorials on quantopian.com/data
date_column : String
String that labels the date column to be used for the event. e.g. `asof_date`
start_date, end_date : Datetime
Start and end date to be used for the cutoff for the distribution plots
"""
event_data = event_data[(event_data[date_column] > start_date) &
(event_data[date_column] < end_date)]
s = pd.Series(event_data[date_column])
sns.set_palette('coolwarm')
distribution = s.groupby([s.dt.year, s.dt.month]).count()
distribution.plot(kind="bar", grid=False,
color=sns.color_palette())
plt.title("Distribution of events in time")
plt.ylabel("Number of event")
plt.xlabel("Date")
return distribution
def plot_cumulative_returns(cumulative_returns, days_before, days_after):
"""
Plots a cumulative return chart
Parameters
----------
cumulative_returns : pd.series
Series that contains the cumulative returns time series from
days_before ~ days_after from date for sid. See `get_cum_returns
days_before, days_after : Datetime
Positive integer detailing the numbers of days before/after event date
"""
xticks = define_xticks(days_before, days_after)
cumulative_returns.plot(xticks=xticks)
plt.grid(b=None, which=u'major', axis=u'y')
plt.title("Cumulative Return before and after event")
plt.xlabel("Window Length (t)")
plt.ylabel("Cumulative Return (r)")
plt.legend(["N=%s" % cumulative_returns.name], loc='best')
def plot_cumulative_returns_with_error_bars(cumulative_returns, returns_with_error,
days_before, days_after, abnormal=False):
"""
Plots a cumulative return chart with error bars. Can choose between abnormal returns
and simple returns
Parameters
----------
cumulative_returns : pd.Series
Series that contains the cumulative returns time series from
days_before ~ days_after from date for sid. See `get_cum_returns
returns_with_error: pd.Series
Series that contains the standard deviation of returns passed in through
`cumulative_returns`. See `get_returns`
days_before, days_after : Datetime
Positive integer detailing the numbers of days before/after event date
abnormal : Boolean, optional
If True, will plot labels indicating an abnormal returns chart
"""
xticks = define_xticks(days_before, days_after)
returns_with_error.ix[:-1] = 0
plt.errorbar(cumulative_returns.index, cumulative_returns, xerr=0, yerr=returns_with_error)
plt.grid(b=None, which=u'major', axis=u'y')
if abnormal:
plt.title("Cumulative Abnormal Return before and after event with error")
else:
plt.title("Cumulative Return before and after event with error")
plt.xlabel("Window Length (t)")
plt.ylabel("Cumulative Return (r)")
plt.legend()
def plot_cumulative_returns_against_benchmark(cumulative_returns,
benchmark_returns,
days_before, days_after):
"""
Plots a cumulative return chart against the benchmark returns
Parameters
----------
cumulative_returns, benchmark_returns : pd.series
Series that contains the cumulative returns time series from
days_before ~ days_after from date for sid/benchmark. See `get_cum_returns`
days_before, days_after : Datetime
Positive integer detailing the numbers of days before/after event date
"""
xticks = define_xticks(days_before, days_after)
cumulative_returns.plot(xticks=xticks, label="Event")
benchmark_returns.plot(xticks=xticks, label='Benchmark')
plt.title("Benchmark cum returns versus Event")
plt.ylabel("% Cumulative Return")
plt.xlabel("Time Window")
plt.legend(["Event", 'Benchmark'], loc='best')
plt.grid(b=None, which=u'major', axis=u'y')
def plot_cumulative_abnormal_returns(cumulative_returns,
abnormal_returns,
days_before, days_after):
"""
Plots a cumulative return chart against the abnormal returns
Parameters
----------
cumulative_returns, abnormal_returns : pd.series
Series that contains the cumulative returns time series against abnormal returns
from days_before ~ days_after from date for sid. See `get_cum_returns`
days_before, days_after : Datetime
Positive integer detailing the numbers of days before/after event date
"""
xticks = define_xticks(days_before, days_after)
abnormal_returns.plot(xticks=xticks, label="Abnormal Average Cumulative")
cumulative_returns.plot(xticks=xticks, label="Simple Average Cumulative")
plt.axhline(y=abnormal_returns.ix[0], linestyle='--', color='black', alpha=.3)
plt.axhline(y=abnormal_returns.max(), linestyle='--', color='black', alpha=.3)
plt.title("Cumulative Abnormal Returns versus Cumulative Returns")
plt.ylabel("% Cumulative Return")
plt.xlabel("Time Window")
plt.grid(b=None, which=u'major', axis=u'y')
plt.legend(["Abnormal Average Cumulative","Simple Average Cumulative"], loc='best')
def get_returns(event_data, benchmark, date_column, days_before, days_after,
use_liquid_stocks=False, top_liquid=1000):
"""
Calculates cumulative returns, benchmark returns, abnormal returns, and
volatility for cumulative and abnomral returns
Parameters
----------
event_data : pd.DataFrame
DataFrame that contains the events data with date and sid columns as
a minimum. See interactive tutorials on quantopian.com/data
benchmark : string, int, zipline.assets._assets.Equity object
Security to be used as benchmark for returns calculations. See `get_returns`
date_column : String
String that labels the date column to be used for the event. e.g. `asof_date`
days_before, days_after : Datetime
Positive integer detailing the numbers of days before/after event date
use_liquid_stocks : Boolean
If set to True, it will filter out any securities found in `event_data`
according to the filters found in `filter_universe`
top_liquid : Int
If use_liquid_stocks is True, top_liquid determines the top X amount of stocks
to return ranked on liquidity
Returns
-------
cumulative_returns, benchmark_returns, abnormal_returns
returns_volatiliy, abnormal_returns_volatility : pd.Series
valid_sids: list
Used to graph distribution of events (in case of use_liquid_stocks flag)
"""
cumulative_returns = []
benchmark_returns = []
abnormal_returns = []
valid_sids = []
liquid_stocks = None
print "Running Event Study"
for i, row in event_data[['sid', date_column]].iterrows():
sid, date = row
# Getting 10 extra days of data just to be sure
extra_days_before = math.ceil(days_before * 365.0/252.0) + 10
start_date = date - timedelta(days=extra_days_before)
extra_days_after = math.ceil(days_after * 365.0/252.0) + 10
end_date = date + timedelta(days=extra_days_after)
if use_liquid_stocks:
if liquid_stocks is None:
liquid_stocks = get_liquid_universe_of_stocks(date, date, top_liquid=top_liquid)
if sid not in liquid_stocks:
continue
valid_sids.append(sid)
# duplicated columns would break get_cum_returns
pr_sids = set([sid, benchmark])
prices = get_pricing(pr_sids, start_date=start_date,
end_date=end_date, fields='open_price')
prices = prices.shift(-1)
if date in prices.index:
results = get_cum_returns(prices, sid, date, days_before, days_after, benchmark)
if results is None:
print "Discarding event for %s on %s" % (symbols(sid),date)
continue
sid_returns, b_returns, ab_returns = results
cumulative_returns.append(sid_returns)
benchmark_returns.append(b_returns)
abnormal_returns.append(ab_returns)
sample_size = len(cumulative_returns)
returns_volatility = pd.concat(cumulative_returns, axis=1).std(axis=1)
abnormal_returns_volatility = pd.concat(abnormal_returns, axis=1).std(axis=1)
benchmark_returns = pd.concat(benchmark_returns, axis=1).mean(axis=1)
abnormal_returns = pd.concat(abnormal_returns, axis=1).mean(axis=1)
cumulative_returns = pd.concat(cumulative_returns, axis=1).mean(axis=1)
cumulative_returns.name = sample_size
return (cumulative_returns, benchmark_returns, abnormal_returns,
returns_volatility, abnormal_returns_volatility, valid_sids)
def custom_event_study(event_data, date_column='asof_date',
start_date='2007-01-01', end_date='2014-01-01',
benchmark=None, days_before=10, days_after=10, top_liquid=500,
use_liquid_stocks=True, plot_error=False):
"""
Calculates simple & cumulative returns for events and plots stock price movement
before and after the event date.
Parameters
----------
event_data : pd.DataFrame
DataFrame that contains the events data with date and sid columns as
a minimum. See interactive tutorials on quantopian.com/data
date_column : String
String that labels the date column to be used for the event. e.g. `asof_date`
start_date, end_date : Datetime
Start and end date to be used for the cutoff for the evenet study
benchmark : int or zipline.assets._assets.Equity object
Security to be used as benchmark for returns calculations. See `get_returns`
days_before, days_after : int
Days before/after to be used to calculate returns for.
top_liquid : Int
If use_liquid_stocks is True, top_liquid determines the top X amount of stocks
to return ranked on liquidity
use_liquid_stocks : Boolean
If set to True, it will filter out any securities found in `event_data`
according to the filters found in `filter_universe`
"""
if date_column not in event_data or not isinstance(event_data, pd.DataFrame) or 'sid' not in event_data:
raise KeyError("event_data not properly formatted for event study. Please make sure " \
"date_column and 'sid' are both present in the DataFrame")
if isinstance(benchmark, str):
raise TypeError("Benchmark must be an equity object. Please use symbols('ticker') to" \
"set your benchmark")
if benchmark is None:
benchmark = symbols('SPY')
print "Formatting Data"
start_date = pd.to_datetime(start_date)
end_date = pd.to_datetime(end_date)
event_data = event_data[(event_data[date_column] > start_date) &
(event_data[date_column] < end_date)]
event_data.sid = event_data.sid.apply(lambda x: int(x))
print "Getting Plots"
cumulative_returns, benchmark_returns, abnormal_returns, returns_volatility, \
abnormal_returns_volatility, valid_sids = get_returns(event_data, benchmark, date_column,
days_before, days_after,
use_liquid_stocks=use_liquid_stocks,
top_liquid=top_liquid)
event_data = event_data[event_data.sid.isin(valid_sids)]
plt.subplot(5, 2, 1)
distribution = plot_distribution_of_events(event_data, date_column, start_date, end_date)
plt.subplot(5, 2, 2)
plot_cumulative_returns(cumulative_returns, days_before, days_after)
plt.subplot(5, 2, 5)
plot_cumulative_returns_against_benchmark(cumulative_returns, benchmark_returns,
days_before, days_after)
plt.subplot(5, 2, 6)
plot_cumulative_abnormal_returns(cumulative_returns, abnormal_returns,
days_before, days_after)
if plot_error:
plt.subplot(5, 2, 9)
plot_cumulative_returns_with_error_bars(cumulative_returns, returns_volatility,
days_before, days_after)
plt.subplot(5, 2, 10)
plot_cumulative_returns_with_error_bars(cumulative_returns, abnormal_returns_volatility,
days_before, days_after, abnormal=True)
return cumulative_returns, benchmark_returns, abnormal_returns, returns_volatility, \
abnormal_returns_volatility, valid_sids, distribution
def get_shares_outstanding(symbol, date):
qry = query(
fundamentals.valuation.shares_outstanding
).filter(fundamentals.share_class_reference.symbol == symbol)
return get_fundamentals(qry, date - timedelta(days=1)).iloc[0]
def convert_units(row):
symbol = row['symbol']
date = row['asof_date']
buyback_unit = row['buyback_units']
try:
shares_outstanding = get_shares_outstanding(symbol, date)
open_price = get_pricing(symbol, date, date, fields='open_price').iloc[0]
market_cap = open_price * shares_outstanding
except:
row['Market Cap'] = row['Total Value (Mill)'] = row['Percent of SO'] = None
return
if buyback_unit == '$M':
total_bought = row['buyback_amount'] * 1000000.0
percent_bought = (total_bought)/market_cap
if buyback_unit == "Mshares":
percent_bought = row['buyback_amount']/shares_outstanding
total_bought = row['buyback_amount'] * open_price
if buyback_unit == '%':
percent_bought = row['buyback_amount']/100.0
total_bought = percent_bought * market_cap
row['Market Cap (Mill)'] = market_cap/1000000.0
row['Total Value (Mill)'] = total_bought/1000000.0
try:
row['Percent of SO'] = percent_bought.iloc[0]
except:
row['Percent of SO'] = None
return row
def get_yearly_and_all_buyback_earnings_data():
# Here, I do two things
# 1. Get data for each year to graph yearly trends
# 2. Aggregate that data above into one full DataFrame
yearly_buyback_data = {}
yearly_earnings_data = {}
all_earnings_data = pd.DataFrame()
all_buyback_data = pd.DataFrame()
start_year = buyback_auth.asof_date.min().year
end_year = buyback_auth.asof_date.max().year
# Separate out each year into separate DataFrames
for year in range(start_year, end_year+1):
buyback_year = buyback_auth[(buyback_auth.asof_date >= pd.to_datetime("%s-01-01" % year)) &
(buyback_auth.asof_date <= pd.to_datetime("%s-12-31" % year))]
df = odo(buyback_year, pd.DataFrame)
df = df[df['buyback_units'].isin(['%', '%Mshares', '$M'])].apply(
lambda row: convert_units(row), axis=1)
yearly_buyback_data[year] = df[df['Percent of SO'] < 1]
earnings_year = ev_earnings[(ev_earnings.asof_date >= pd.to_datetime("%s-01-01" % year)) &
(ev_earnings.asof_date <= pd.to_datetime("%s-12-31" % year)) &
(ev_earnings.symbol.isin(df.symbol.unique()))]
yearly_earnings_data[year] = odo(earnings_year, pd.DataFrame)
# Take data from above and aggregate it all into a single DataFrame
for year, df in yearly_earnings_data.iteritems():
all_earnings_data = all_earnings_data.append(df)
all_buyback_data = all_buyback_data.append(yearly_buyback_data[year])
all_earnings_data = all_earnings_data.sort(columns=['asof_date'])
all_buyback_data = all_buyback_data.sort(columns=['asof_date']).dropna(subset=['sid'])
return yearly_buyback_data, all_buyback_data, yearly_earnings_data, all_earnings_data
def get_earnings_date(buyback_row, all_earnings_data):
buyback_date = buyback_row['asof_date']
sid = buyback_row['sid']
earnings_for_sid = all_earnings_data[all_earnings_data['sid'] == sid]
if len(earnings_for_sid.index) == 0:
return None, None, None
try:
idx = earnings_for_sid['asof_date'].searchsorted(buyback_date)[0]
earnings_date = earnings_for_sid.iloc[idx]['asof_date']
# To get the min
if idx != 0:
earnings_date_2 = earnings_for_sid.iloc[idx - 1]['asof_date']
if abs(earnings_date_2 - buyback_date) < abs(earnings_date - buyback_date):
earnings_date = earnings_date_2
except:
idx = earnings_for_sid['asof_date'].searchsorted(buyback_date)[0]
earnings_date = earnings_for_sid.iloc[-1]['asof_date']
return buyback_date, earnings_date, earnings_for_sid
def get_days_away(all_buyback_data, all_earnings_data):
days_away = pd.Series()
for i, buyback_row in all_buyback_data.iterrows():
buyback_date, earnings_date, earnings_for_sid = get_earnings_date(buyback_row, all_earnings_data)
if buyback_date is None:
continue
days_away = days_away.append(
pd.Series({buyback_row['sid']: (buyback_date - earnings_date).days}))
return days_away
def find_earnings_announcements(all_buyback_data, all_earnings_data, days=[16, 31]):
earnings_announcements = pd.DataFrame()
buyback_announcements = pd.DataFrame()
days.sort()
day_range = range(days[0], days[-1])
for i, buyback_row in all_buyback_data.iterrows():
buyback_date, earnings_date, earnings_for_sid = get_earnings_date(buyback_row, all_earnings_data)
if buyback_date is None:
continue
days_away = (buyback_date - earnings_date).days
# If days away in day range, save the event
if days_away in day_range:
earnings_for_sid = earnings_for_sid[earnings_for_sid.asof_date == earnings_date]
earnings_announcements = earnings_announcements.append(earnings_for_sid)
buyback_announcements = buyback_announcements.append(buyback_row)
return earnings_announcements, buyback_announcements
def insert_sic_code(row):
qry = query(
fundamentals.asset_classification.sic
).filter(fundamentals.share_class_reference.symbol == row['symbol'])
try:
row['sic'] = get_fundamentals(qry, row['asof_date']).iloc[0][0]
except:
row['sic'] = None
return row
def get_returns_table(earnings_study_results, return_intervals):
return_table = {}
for event_study in earnings_study_results:
cumulative_returns, benchmark_returns, abnormal_returns, returns_volatility, \
abnormal_returns_volatility, valid_sids, distribution = earnings_study_results[event_study]
for i, interval in enumerate(return_intervals):
start, end = interval
raw_temp = cumulative_returns + 1
raw_returns = "%.04f" % ((raw_temp.ix[end] - raw_temp.ix[start])/raw_temp.ix[start])
ab_temp = abnormal_returns + 1
ab_returns = "%0.4f" % ((ab_temp.ix[end] - ab_temp.ix[start])/ab_temp.ix[start])
temp_table = {'Raw': raw_returns, 'Abnormal (CAPM)': ab_returns}
N = distribution.sum()
return_table[("%s N=%s" % (event_study, N), interval)] = temp_table
return pd.DataFrame(return_table).T.unstack()