It has been widely reported that companies with women in senior management and on the board of directors perform better than companies without. Credit Suisses Gender 3000 report looks at gender diversity in 3000 companies across 40 countries. According to this report, at the end of 2013, women accounted for 12.9% of top management (CEOs and directors reporting to the CEO) and 12.7% of boards had gender diversity. Additionally, Companies with more than one woman on the board have returned a compound 3.7% a year over those that have none since 2005.
These kind of reports quickly lead to the question, What would happen if you invested in companies with female CEOs?
The first challenge was finding a data source. Ideally, in order to create an algorithm to do this investing for me, I would need an evolving datasource. One that is updated with CEO gender on a fairly regular basis. But to get started, I decided to just look for a historical listing of female CEOs in public companies. After a little bit of Google searching, I found Catalysts (http://www.catalyst.org/) Bottom Line Research Project (http://www.catalyst.org/knowledge/bottom-line-0) which indicated the data supporting the report was available in their library. I reached out to the team there, explained what I was trying to do, and within a day or two had a pdf listing all of the women who had served as CEO of Fortune 1000 companies dated back to 1995.
Some manual work was required to get the list of women into Excel. I also needed the start and end day of each CEOs tenure in the position as well as the ticker symbol. Google was particularly helpful, and within a few hours I was set to start exploring the data set.
The first thing was importing the data into the research platform and importing some of the basic libraries I knew I would need.
#Import the libraries needed for the analysis.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as pyplot
import pytz
from pytz import timezone
from zipline import TradingAlgorithm
from zipline.api import (order_target_percent, record, symbol, history, add_history, get_datetime, get_open_orders,
get_order, order_target_value, order, order_target )
from zipline.finance.slippage import FixedSlippage
#Import my csv and rename some of the columns
CEOs = local_csv('FemaleCEOs_v3.csv', raw=True)
CEOs.rename(columns={'SID':'Ticker', 'Start Date':'start_date', 'End Date':'end_date'}, inplace=True)
CEOs[0:10]
To get an understand of the data, I wanted to understand how many new female CEOs there are each year. Quantopiana pricing data goes back to 2002, so I am working with 12 years of information. Since 2002, there have been 77 female CEOs in the Fortune 1000 who run (or have run) public companies. Was that enough to be interesting?
CEOs_data = CEOs
CEOs_data['year_started'] = pd.DatetimeIndex(CEOs_data['start_date']).year
CEOs_data['year_ended'] = pd.DatetimeIndex(CEOs_data['end_date']).year
temp1 = CEOs_data['year_started'].value_counts(sort=False)
temp2 = CEOs_data['year_ended'].value_counts(sort=False)
temp4 = temp1.subtract(temp2, fill_value=0)
temp4[2014] = 2
temp4.plot(kind='bar')
Plotting the data shows that the trend is headed in the right direction, but that the majority of the women that have served as CEOs to Fortune 1000 companies have done so since 2008. At this point, it isn't clear to me if this will have an impact on the analysis. It's just worth noting.
The next few cells are devoted to scrubbing the data. I didnt do these all at the beginning, they evolved over time. In the interest of keeping the notebook organized, and this conversation interesting, Ive grouped them all at the beginning of the notebook.
# First I need to convert the date values in the csv to datetime objects in UTC timezone.
CEOs['start_date'] = CEOs['start_date'].apply(lambda row: pd.to_datetime(str(row), utc=True))
CEOs['end_date'] = CEOs['end_date'].apply(lambda row: pd.to_datetime(str(row), utc=True))
# Then I want to check if any of the dates are weekends.
# If they are a weekend, I move them to the following Monday.
def check_date(row):
week_day = row.isoweekday()
if week_day == 6:
row = row + timedelta(days=2)
elif week_day == 7:
row = row + timedelta(days=1)
return row
CEOs['start_date'] = CEOs['start_date'].apply(check_date)
CEOs['end_date'] = CEOs['end_date'].apply(check_date)
# We need to deal with the dates that are outside of our pricing data range
# For people that started prior to 01/02/2002, I have changes their start date to 01/02/2002
# I also changed any future dated end dates to 12/1/2014, just to be safe.
def change_date(row):
start_date = row['start_date']
end_date = row['end_date']
if start_date < pd.to_datetime("2002-01-02", utc=True):
row['start_date'] = pd.to_datetime("2002-01-02", utc=True)
elif end_date > pd.to_datetime("2015-01-01", utc=True):
row['end_date'] = pd.to_datetime("2014-12-01", utc=True)
return row
CEOs = CEOs.apply(change_date, axis=1)
# I then add a new row called SID, which is the Security Identifier.
# Since ticker symboles are not unique across all time, the SID ensures we have the right company.
# I use the ticker and the start date to search for the security object
def get_SID(row):
temp_ticker = row['Ticker']
start_date = row['start_date'].tz_localize('UTC')
row['SID'] = symbols(temp_ticker, start_date)
return row
CEOs = CEOs.apply(get_SID, axis=1)
CEOs.sort(columns='start_date')
Now I have a clean data set, with all of the data I need and hopefully nothing that will trip me up.
My first goal was to see for each company, the historical price plotted on a graph, with markers where the female CEOs started and where they ended. The idea was to look for any trends and to see if I could determine anything interesting manually.
To do this, I needed all of the historical pricing data for each of these companies stored in a new dataframe.
# I make a series out of just the SIDs.
SIDs = CEOs.SID
# Then call get_pricing on the series of SIDs and store the results in a new dataframe called prices.
prices = get_pricing(
SIDs,
start_date='2002-01-01',
end_date='2014-12-31',
fields ='close_price',
handle_missing='ignore'
)
prices[0:3]
Next I need to get the pricing data and start and end dates plotted on a chart for an individual security.
I decided to do this one security at a time, both because I think this is a big use case for research users and I wanted to see how it was done, and because I thought it would help me to just get a feel for the data.
Ultimately, while this is interesting, educational and fun to see, it doesnt tell me much. Generally the market drop in 2008 is a huge factor.
security = 128 #found this by hand 2351, 128, 6330, 3490, 24819
adm_df = CEOs[(CEOs['SID'] == security)]
sec_df = prices[security]
fig = pyplot.figure()
ax2 = fig.add_subplot(212)
start_date = adm_df['start_date']
end_date = adm_df['end_date']
prices[security].plot(ax=ax2, figsize=(16, 15))
ax2.plot(start_date, prices.ix[start_date][security], '^', markersize=20, color='m')
ax2.plot(end_date, prices.ix[end_date][security], 'v', markersize=20, color='m')
pyplot.legend()
print start_date
print end_date
Based on this research, I decided the best thing to do was write a simple algo, see if it was interesting and then iterate. I decided that the most simple algo I could write, would just buy some number of stocks the day a female CEO started her job, and sell them when she left the position. Just to prove to myself that I could, I outlined what that might look like.
def buy_sell(row):
todays_date = pd.to_datetime('2005-10-03')
start_date = row['start_date']
end_date = row['end_date']
sid = row['SID']
if start_date == todays_date:
print ("Buy!")
print sid
elif end_date == todays_date:
print ("Sell!")
print sid
return row
CEOs = CEOs.apply(lambda row:buy_sell(row), axis=1)
Success! I can figure out the right date to buy and sell my securities! For a backtest, the code needs to be a little more complicated. Here I am manually setting a date, but for my backtest this will be fed into the algorithm.
Once I had the outline, I decided I try writing an algorithm in the research environment using zipline to test it's results. The first version of my algo was purposefully as simple as I could make it. Check the date, find any CEOs who started or ended their tenure on that date. If they started on a specific date, buy 500 shares, and if they ended their tenure on a particular date, sell all of the position.
# First I create a list of all the companies SIDs that I want to use.
tickers_to_use = CEOs.SID
# I then get a datframe of the historial pricing information for those companies.
data = get_pricing(
tickers_to_use,
start_date='2002-01-01',
end_date='2014-12-31',
fields ='close_price',
handle_missing='log'
)
#Pull out a couple of tickers that are having issues because of a known bug
CEOs = CEOs[(CEOs['Ticker'] != ('RDA'))]
CEOs = CEOs[(CEOs['Ticker'] != ('WEN'))]
CEOs = CEOs[(CEOs['Ticker'] != ('GAS'))]
"""
This is where I initialize my algorithm
"""
def initialize_first(context):
#load the CEO data and a variable to count the number of stocks held at any time as global variables
context.CEOs = CEOs
context.stocks_held = 0
"""
Handle data is the function that is running every minute (or day) looking to make trades
"""
def handle_data_first(context, data):
# get todays date
today = get_datetime()
# get a dataframe with just the companies where start_date (or end date) is today
start_today = context.CEOs[context.CEOs.start_date==today]
end_today = context.CEOs[context.CEOs.end_date==today]
# Iterrows then iterates through the rows of my start_today dataframe, for each row it
#: 1. Creates a variable for the current SID and the current ticker
#: 2. Determines if the SID is in our pricing data (ie. do we have pricing data for this SID for today)
#: 3. If it is, it increases stocks_held by 1 and buys 500 shares of that stock
for idx, row in start_today.iterrows():
current_sid = row['SID']
current_ticker = row['Ticker']
if current_sid in data:
#print 'On {} buy {}'.format(today, current_ticker)
context.stocks_held = context.stocks_held + 1
order_id = order_target(current_sid, 500)
# We then do the same thing for my end_today dataframe to determine what we should sell
for idx, row in end_today.iterrows():
current_sid = row['SID']
current_ticker = row['Ticker']
if current_sid in data:
#print 'On {} sell {} of {}'.format(today, context.portfolio.positions[current_sid], current_ticker)
context.stocks_held = context.stocks_held - 1
order_id = order_target(current_sid, 0)
"""
Here's where we will instantiate the Trading Algorithm and run our simulation
"""
#: We tell zipline to run the algo using initialize and handle_data as our two functions
algo_obj = TradingAlgorithm(
initialize=initialize_first,
handle_data=handle_data_first
)
"""
Plotting function for plotting our transactions and our long/short positions
"""
#: We then get the results from Zipline and plot them
def analyze(context, perf):
fig = pyplot.figure()
ax1 = fig.add_subplot(211)
perf.portfolio_value.plot(ax = ax1, figsize=(14,12))
ax1.set_ylabel('portfolio value in $', fontsize=20)
perf_trans = perf.ix[[t != [] for t in perf.transactions]]
buys = perf_trans.ix[[t[0]['amount'] > 0 for t in perf_trans.transactions]]
sells = perf_trans.ix[[t[0]['amount'] < 0 for t in perf_trans.transactions]]
ax1.plot(buys.index, perf.portfolio_value.ix[buys.index],
'^', markersize=10, color='m')
ax1.plot(sells.index, perf.portfolio_value.ix[sells.index],
'v', markersize=10, color='k')
pyplot.legend(loc=0)
pyplot.show()
#: Custom Plotting Function
algo_obj._analyze = analyze
#: Run the simulation
perf_manual = algo_obj.run(data)
Wow! That looks pretty good! It goes up and to the right...and up and to the right by A LOT. It seems remarkable right?
Not so fast.....We really need to consider how the market performed during the same time period.
For the purposes of this exercise, I decided to use the S&P 500 as my benchmark. You could argue that there were other benchmarks that might be better suited, but this was the easiest. For a first pass, that seemed reasonable.
#: First we need to get the data of the S&P500, since this is going to be our benchmark.
data_SPY = get_pricing(['SPY'],
start_date='2002-01-01',
end_date='2015-02-10',
fields='close_price',
frequency='daily')
"""
This cell creates an extremely simple handle_data that will keep 100%
of our portfolio in the SPY and I'll plot against the algorithm defined above.
"""
#: Here I'm defining the algo that I have above so I can run with a new graphing method
my_algo = TradingAlgorithm(
initialize=initialize_first,
handle_data=handle_data_first
)
def bench_initialize(context):
context.first_bar = True
#: Define a simple handle_data that will keep 100% in SPY
def bench_handle(context, data):
if context.first_bar:
order_target_percent(8554, 1)
context.first_bar = False
else:
pass
#: Define the algo that will run as the benchmark
bench_algo = TradingAlgorithm(
initialize=bench_initialize,
handle_data=bench_handle
)
#: Create a figure to plot on the same graph
fig = pyplot.figure()
ax1 = fig.add_subplot(211)
#: Create our plotting algorithm
def my_algo_analyze(context, perf):
perf.portfolio_value.plot(ax = ax1, label="My Algo")
def bench_algo_analyze(context, perf):
perf.portfolio_value.plot(ax = ax1, label="Benchmark")
#: Insert our analyze methods
my_algo._analyze = my_algo_analyze
bench_algo._analyze = bench_algo_analyze
# Run algorithms
my_algo.run(data)
bench_algo.run(data_SPY)
#: Plot the graph
ax1.set_ylabel('portfolio value in $', fontsize=20)
ax1.set_title("Cumulative Return", fontsize=20)
ax1.legend(loc='best')
fig.tight_layout()
pyplot.show()
Would you look at that? My algo is beating the S&P 500!
I feel pretty amazing at this point. I know little to nothing about the market, but I'm beating it by investing in women.
Then I chat with a couple of professional quants. I explain the basics of what I am doing and learn that buying a set number of shares really isn't what a professional would do.
I need to modify my algo so that it rebalances based on the number of stocks in my portfolio. When I own one stock, it should be 100% of my portfolio. When I own two stocks, they should each be 50% of my portfolio. As the number of stocks in my portfolio changes, the target weight of each stock should change too.
I also learned about slippage, and needed to add some protection for that at this point.
"""
This is where I initialize my algorithm
"""
from zipline.api import order
from zipline.finance.slippage import FixedSlippage
def initialize(context):
#load the CEO data and a variable to count the number of stocks held at any time as global variables
context.CEOs = CEOs
context.current_stocks = []
context.stocks_to_order_today = []
context.stocks_to_sell_today = []
context.set_slippage(FixedSlippage(spread=0))
"""
Handle data is the function that is running every minute (or day) looking to make trades
"""
from zipline.api import order
def handle_data(context, data):
#: Set my order and sell dictionaries to empty at the start of any day.
context.stocks_to_order_today = []
context.stocks_to_sell_today = []
# Get todays date.
today = get_datetime()
# Get a dataframe with just the companies where start_date (or end date) is today.
context.stocks_to_order_today = context.CEOs.SID[context.CEOs.start_date==today].tolist()
context.stocks_to_sell_today= context.CEOs.SID[context.CEOs.end_date==today].tolist()
context.stocks_to_sell_today = [s for s in context.stocks_to_sell_today if s!= None]
context.stocks_to_order_today = [s for s in context.stocks_to_order_today if s!= None]
# If there are stocks that need to be bought or sold today
if len(context.stocks_to_order_today) > 0 or len(context.stocks_to_sell_today) > 0:
# print "-----------------------------------"
# print today
# print "cash = %s" % context.portfolio.cash
# print "current stocks = %s" % len(context.current_stocks)
# print "stocks to sell = %s" % len(context.stocks_to_sell_today)
# if len(context.stocks_to_sell_today) > 0:
# print "Stocks to sell %s" % context.stocks_to_sell_today
# print "stocks to buy = %s" % len(context.stocks_to_order_today)
# if len(context.stocks_to_order_today) > 0:
# print "Stocks to order today %s" % context.stocks_to_order_today
# First sell any that need to be sold, and remove them from current_stocks.
for stock in context.stocks_to_sell_today:
if stock in data:
if stock in context.current_stocks:
order_target(stock,0)
context.current_stocks.remove(stock)
#print "Selling %s" % stock
# Then add any I am buying to current_stocks.
for stock in context.stocks_to_order_today:
context.current_stocks.append(stock)
# Then rebalance the portfolio so I have and equal amount of each stock in current_stocks.
for stock in context.current_stocks:
if stock in data:
#print "Buying and/or rebalancing %s at target weight %s" % (stock, target_weight)
#calculate the value to buy
portfolio_value = context.portfolio.portfolio_value
num_stocks = len(context.current_stocks)
value_to_buy = portfolio_value/num_stocks
#print "Buying and/or rebalancing %s at value = %s" % (stock, value_to_buy)
order_target_value(stock,value_to_buy)
"""
This cell will create an extremely simple handle_data that will keep 100%
of our portfolio into the SPY and I'll plot against the algorithm defined above.
"""
#: Here I'm defining the algo that I have above so I can run with a new graphing method
my_algo = TradingAlgorithm(
initialize=initialize,
handle_data=handle_data
)
def bench_initialize(context):
context.first_bar = True
#: Define a simple handle_data that will keep 100% in SPY
def bench_handle(context, data):
if context.first_bar:
order_target_percent(8554, 1)
context.first_bar = False
else:
pass
#: Define the algo that will run as the benchmark
bench_algo = TradingAlgorithm(
initialize=bench_initialize,
handle_data=bench_handle
)
#: Create a figure to plot on the same graph
fig = pyplot.figure()
ax1 = fig.add_subplot(211)
#: Create our plotting algorithm
def my_algo_analyze(context, perf):
perf.portfolio_value.plot(ax = ax1, label="Fortune 1000 Companies with Female CEOs")
def bench_algo_analyze(context, perf):
perf.portfolio_value.plot(ax = ax1, label="Benchmark")
#: Insert our analyze methods
my_algo._analyze = my_algo_analyze
bench_algo._analyze = bench_algo_analyze
# Run algorithms
returns = my_algo.run(data)
bench_algo.run(data_SPY)
#: Plot the graph
ax1.set_ylabel('portfolio value in $', fontsize=20)
ax1.set_title("Cumulative Return", fontsize=20)
ax1.legend(loc='best')
fig.tight_layout()
pyplot.show()
This just keeps getting better. It's really almost too good to be true.
Let's do a couple of quick checks to ensure that I am buying and selling securities and that my leverage isn't out of control.
#Look at the number of positions over time.
def find_rows(row):
positions = [pos for pos in row['positions'] if pos['amount'] > 0]
row['position_len'] = len(positions)
return row
returns['position_len']=0
returns = returns.apply(lambda row: find_rows(row), axis=1)
returns.position_len.plot()
# Look at the leverage over time.
returns.gross_leverage.plot()
Now that I have an algo that looks so amazing, I am trying to understand why.
One suggestion, was to look for similarities in the companies that I have invested in besides the gender of their CEO, such as their sector.
I pulled and plot the Morningstar sector codes for each company.
sectors = local_csv('CEOs_sector_output.csv')
sector_count = sectors['Sector'].value_counts(sort=False)
sector_count.plot(kind='bar')
It does look like I have a slight bias towards consumer cyclical companies. These include companies such as, GM, eBay, The New York Times and Ann Taylor Stores.
The next question might be to ask, "Is my sector weighting responsible for the performance?" Using XLY, a consumer discretionary ETF, we can get a comparison of how consumer companies are doing against the S&P500 for the same time period.
consumer = get_pricing(['XLY','SPY'],
start_date = '2002-01-02',
end_date = '2015-02-01',
fields = 'close_price')
def cum_returns(df):
return (1 + df).cumprod() - 1
cum_returns(consumer.pct_change()).plot()
It looks like consumer discretionary companies have done well in the last 12 years. It's possible that this is part of the success of this strategy. However, there are also a number of female CEOs in the Industrials (Lockheed Martin, General Dynamics, etc) Technology (Yahoo, Xerox, IBM, HP, etc) and Utilities (American Water Works, Portland General Electric, Alliant Energy) as well.
Develping a 'sector neutral' version of this algo would be a good next step. One way to do this, would be to force my portfolio to have an equal weight in each sector, as opposed to an equal weight in each stock. Doing this would help me determine if there is sector bias in this approach and to plan for different market shifts in the future.
Another next step would be finding the right benchmark. The S&P 500 is a fine place to start, but some better options might be,
Additionally, in order to live trade this algorithm, and updating data feed would be needed. Something that updates when CEOs change and includes the gender. If I can find this data set, on a broader subset of stocks, I will absolutely look to evalute this outside of the Fortune 1000 in the interest of developing an algo I can live trade.