Notebook

Investing In Women-led Fortune 1000 Companies

It has been widely reported that companies with women in senior management and on the board of directors perform better than companies without. Credit Suisses Gender 3000 report looks at gender diversity in 3000 companies across 40 countries. According to this report, at the end of 2013, women accounted for 12.9% of top management (CEOs and directors reporting to the CEO) and 12.7% of boards had gender diversity. Additionally, Companies with more than one woman on the board have returned a compound 3.7% a year over those that have none since 2005.

These kind of reports quickly lead to the question, What would happen if you invested in companies with female CEOs?

The first challenge was finding a data source. Ideally, in order to create an algorithm to do this investing for me, I would need an evolving datasource. One that is updated with CEO gender on a fairly regular basis. But to get started, I decided to just look for a historical listing of female CEOs in public companies. After a little bit of Google searching, I found Catalysts (http://www.catalyst.org/) Bottom Line Research Project (http://www.catalyst.org/knowledge/bottom-line-0) which indicated the data supporting the report was available in their library. I reached out to the team there, explained what I was trying to do, and within a day or two had a pdf listing all of the women who had served as CEO of Fortune 1000 companies dated back to 1995.

Some manual work was required to get the list of women into Excel. I also needed the start and end day of each CEOs tenure in the position as well as the ticker symbol. Google was particularly helpful, and within a few hours I was set to start exploring the data set.


The first thing was importing the data into the research platform and importing some of the basic libraries I knew I would need.

In [324]:
#Import the libraries needed for the analysis.

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as pyplot

import pytz
from pytz import timezone

from zipline import TradingAlgorithm
from zipline.api import (order_target_percent, record, symbol, history, add_history, get_datetime, get_open_orders,
                        get_order, order_target_value, order, order_target )
from zipline.finance.slippage import FixedSlippage


#Import my csv and rename some of the columns 
CEOs = local_csv('FemaleCEOs_v3.csv', raw=True)
CEOs.rename(columns={'SID':'Ticker', 'Start Date':'start_date', 'End Date':'end_date'}, inplace=True)

CEOs[0:10]
Out[324]:
CEO Company Name Ticker start_date end_date Months
0 Patricia A Woertz Archer Daniels Midland Company (ADM) ADM 5/1/06 12/31/14 103
1 Patricia Russo Lucent ALU 12/1/06 8/1/08 20
2 Katherine Krill AnnTaylor Stores Corporation ANN 10/1/05 12/31/14 110
3 Angela Braly WellPoint ANTM 6/1/07 8/1/12 62
4 Judy McReynolds Arkansas Best Corp. ARCB 1/2/10 12/31/14 59
5 Andrea Jung Avon Product AVP 11/4/99 4/22/12 149
6 Sheri McCoy Avon Product AVP 4/23/12 12/31/14 32
7 Susan N. Story American Water Works Company AWK 5/9/14 12/31/14 7
8 Gayla Delly Benchmark Electronics BHE 1/2/12 12/31/14 35
9 Elizabeth Smith Bloomin' Brands BLMN 8/9/12 12/31/14 61

How many new female CEOs are there each year?

To get an understand of the data, I wanted to understand how many new female CEOs there are each year. Quantopiana pricing data goes back to 2002, so I am working with 12 years of information. Since 2002, there have been 77 female CEOs in the Fortune 1000 who run (or have run) public companies. Was that enough to be interesting?


In [325]:
CEOs_data = CEOs
CEOs_data['year_started'] = pd.DatetimeIndex(CEOs_data['start_date']).year
CEOs_data['year_ended'] = pd.DatetimeIndex(CEOs_data['end_date']).year

temp1 = CEOs_data['year_started'].value_counts(sort=False)
temp2 = CEOs_data['year_ended'].value_counts(sort=False)
temp4 = temp1.subtract(temp2, fill_value=0)
temp4[2014] = 2

temp4.plot(kind='bar')
Out[325]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa299dcc150>

Plotting the data shows that the trend is headed in the right direction, but that the majority of the women that have served as CEOs to Fortune 1000 companies have done so since 2008. At this point, it isn't clear to me if this will have an impact on the analysis. It's just worth noting.


Data Scrubbing

The next few cells are devoted to scrubbing the data. I didnt do these all at the beginning, they evolved over time. In the interest of keeping the notebook organized, and this conversation interesting, Ive grouped them all at the beginning of the notebook.


In [326]:
# First I need to convert the date values in the csv to datetime objects in UTC timezone. 

CEOs['start_date'] = CEOs['start_date'].apply(lambda row: pd.to_datetime(str(row), utc=True))
CEOs['end_date'] = CEOs['end_date'].apply(lambda row: pd.to_datetime(str(row), utc=True))
In [327]:
# Then I want to check if any of the dates are weekends. 
# If they are a weekend, I move them to the following Monday. 

def check_date(row):
    week_day = row.isoweekday()
    if week_day == 6:
        row = row + timedelta(days=2)
    elif week_day == 7:
        row = row + timedelta(days=1)
    return row

CEOs['start_date'] = CEOs['start_date'].apply(check_date)
CEOs['end_date'] = CEOs['end_date'].apply(check_date)
In [328]:
# We need to deal with the dates that are outside of our pricing data range
# For people that started prior to 01/02/2002, I have changes their start date to 01/02/2002
# I also changed any future dated end dates to 12/1/2014, just to be safe. 

def change_date(row): 
    start_date = row['start_date']
    end_date = row['end_date']
    
    if start_date < pd.to_datetime("2002-01-02", utc=True):
        row['start_date'] = pd.to_datetime("2002-01-02", utc=True)
    elif end_date > pd.to_datetime("2015-01-01", utc=True):
        row['end_date'] = pd.to_datetime("2014-12-01", utc=True)
    return row

CEOs = CEOs.apply(change_date, axis=1)
In [329]:
# I then add a new row called SID, which is the Security Identifier.
# Since ticker symboles are not unique across all time, the SID ensures we have the right company. 
# I use the ticker and the start date to search for the security object

def get_SID(row):
    temp_ticker = row['Ticker']
    start_date = row['start_date'].tz_localize('UTC')

    row['SID'] = symbols(temp_ticker, start_date)
    return row
            
    
CEOs = CEOs.apply(get_SID, axis=1)
CEOs.sort(columns='start_date')

   
Out[329]:
CEO Company Name Ticker start_date end_date Months year_started year_ended SID
5 Andrea Jung Avon Product AVP 2002-01-02 2012-04-23 149 1999 2012 Security(660 [AVP])
18 Margaret Whitman eBay EBAY 2002-01-02 2008-03-03 122 1998 2008 Security(24819 [EBAY])
21 Paula G. Rosput AGL Resources Inc. GAS 2002-01-02 2006-01-02 65 2000 2006 Security(3103 [GAS])
24 S. Marce Fuller Genon Energy GEN 2002-01-02 2005-09-30 74 1999 2005 Security(22720 [GEN])
49 Pamela Kirby Quintiles Transnational Q 2002-01-02 2003-09-25 29 2001 2003 Security(17104 [Q])
58 Cinda Hallman Spherion SFN 2002-01-02 2004-04-01 36 2001 2004 Security(21809 [SFN])
66 Debra Cafaro Ventas VTR 2002-01-02 2014-12-31 189 1999 2014 Security(18821 [VTR])
70 Anne Mulcahy Xerox Corp XRX 2002-01-02 2010-05-20 105 2001 2010 Security(8354 [XRX])
75 Mary Forte Zale ZLC 2002-08-01 2006-01-30 41 2002 2006 Security(10069 [ZLC])
43 Patricia Gallup PC Connection PCCC 2002-09-02 2011-08-08 107 2002 2011 Security(18471 [PCCC])
47 Dona Davis Young Phoenix Companies PNX 2003-01-02 2009-04-15 75 2003 2009 Security(22832 [PNX])
50 Mary Sammons Rite Aid Corp RAD 2003-06-25 2010-06-23 84 2003 2010 Security(6330 [RAD])
19 Mary Agnes Wilderotter Citizens Communications FTR 2004-11-01 2014-12-31 121 2004 2014 Security(2069 [FTR])
41 Janet L. Robinson The New York Times Company NYT 2004-12-27 2012-01-02 84 2004 2011 Security(5551 [NYT])
37 Patricia Kampling Alliant Energy LNT 2005-04-01 2014-12-31 116 2005 2014 Security(18584 [LNT])
51 Susan Ivey Reynolds American RAI 2005-06-27 2011-02-01 68 2005 2011 Security(20277 [RAI])
2 Katherine Krill AnnTaylor Stores Corporation ANN 2005-10-03 2014-12-31 110 2005 2014 Security(430 [ANN])
34 Linda A. Lang Jack in the Box JACK 2005-10-03 2014-01-02 99 2005 2014 Security(20740 [JACK])
20 Mary Agnes Wilderotter Frontier Communications FTR 2006-01-02 2014-12-31 107 2006 2014 Security(2069 [FTR])
76 Mary Burton Zale ZLC 2006-01-30 2007-12-03 23 2006 2007 Security(10069 [ZLC])
56 Claire Babrowski Radio Shack RSH 2006-02-01 2006-07-05 5 2006 2006 Security(21550 [RSH])
48 Peggy Y. Fowler Portland General Electric POR 2006-04-13 2009-03-02 107 2006 2009 Security(28318 [POR])
0 Patricia A Woertz Archer Daniels Midland Company (ADM) ADM 2006-05-01 2014-12-31 103 2006 2014 Security(128 [ADM])
27 Constance H. Lau Hawaiian Electric Industries Inc. HE 2006-05-02 2014-12-31 103 2006 2014 Security(3509 [HE])
38 Irene Rosenfeld Mondelez International MDLZ 2006-06-26 2014-12-31 102 2006 2014 Security(22802 [MDLZ])
44 Indra Nooyi PepsiCo PEP 2006-10-02 2014-12-31 98 2006 2014 Security(5885 [PEP])
69 Christina Gold Western Union Holdings WU 2006-10-02 2010-09-01 47 2006 2010 Security(32603 [WU])
67 Kerrii B. Anderson Wendy's International WEN 2006-11-09 2008-09-01 22 2006 2008 Security(8146 [WEN])
1 Patricia Russo Lucent ALU 2006-12-01 2008-08-01 20 2006 2008 Security(273 [ALU])
61 Carol Meyrowitz TJX Corp TJX 2007-01-29 2014-12-31 95 2007 2014 Security(7457 [TJX])
... ... ... ... ... ... ... ... ... ...
63 Kathleen A. Ligocki Tower Automotive TOWR 2010-10-25 2007-08-01 48 2010 2007 Security(40257 [TOWR])
35 Beth E. Mooney KeyCorp KEY 2011-05-02 2014-12-31 43 2011 2014 Security(4221 [KEY])
10 Diane M. Sullivan Brown Shoe Company BWS 2011-05-26 2014-12-31 43 2011 2014 Security(1195 [BWS])
59 Debra Reed Sempra Energy SRE 2011-06-27 2014-12-31 42 2011 2014 Security(24778 [SRE])
13 Denise Morrison Campbell Soup CPB 2011-08-01 2014-12-31 40 2011 2014 Security(1795 [CPB])
11 Sandra Cochran Cracker Barrel Old Country Store CBRL 2011-09-12 2014-12-31 39 2011 2014 Security(1308 [CBRL])
28 Meg Whitman HP HPQ 2011-09-22 2014-12-31 39 2011 2014 Security(3735 [HPQ])
33 Denise Ramos ITT ITT 2011-10-03 2014-12-31 38 2011 2014 Security(14081 [ITT])
22 Gracia Martore Gannett GCI 2011-10-06 2014-12-31 38 2011 2014 Security(3128 [GCI])
72 Gretchen McClain Xylem XYL 2011-10-24 2013-09-09 23 2011 2013 Security(42023 [XYL])
8 Gayla Delly Benchmark Electronics BHE 2012-01-02 2014-12-31 35 2012 2014 Security(856 [BHE])
30 Virgina Rometty IBM IBM 2012-01-02 2014-12-31 35 2012 2014 Security(3766 [IBM])
39 Heather Bresch Mylan MYL 2012-01-02 2014-12-31 35 2012 2014 Security(5166 [MYL])
6 Sheri McCoy Avon Product AVP 2012-04-23 2014-12-31 32 2012 2014 Security(660 [AVP])
74 Marissa Mayer Yahoo YHOO 2012-07-17 2014-12-31 29 2012 2014 Security(14848 [YHOO])
9 Elizabeth Smith Bloomin' Brands BLMN 2012-08-09 2014-12-31 61 2012 2014 Security(43283 [BLMN])
14 Andrea Ayers Convergys CVG 2012-10-02 2014-12-31 26 2012 2014 Security(19203 [CVG])
40 Wellington J. Denahan-Norris Annaly Capital Management NLY 2012-11-05 2014-12-31 25 2012 2014 Security(17702 [NLY])
12 Linda Massman Clearwater Paper CLW 2013-01-02 2014-12-31 23 2013 2014 Security(37775 [CLW])
23 Phebe Novakovic General Dynamics GD 2013-01-02 2014-12-31 23 2013 2014 Security(3136 [GD])
36 Marillyn Hewson Lockheed Martin LMT 2013-01-02 2014-12-31 23 2013 2014 Security(12691 [LMT])
65 Kimberly Bowers CST Brands VLO 2013-01-02 2014-12-31 23 2013 2014 Security(7990 [VLO])
62 Sheryl Palmer Taylor Morrison Home TMHC 2013-04-22 2014-12-31 88 2013 2014 Security(44433 [TMHC])
17 Lynn Good Duke Energy DUK 2013-07-01 2014-12-31 17 2013 2014 Security(2351 [DUK])
64 Mary Dillon Ulta Salon Cosmetics & Fragrance ULTA 2013-07-01 2014-12-31 17 2013 2014 Security(34953 [ULTA])
26 Lauralee Martin HCP HCP 2013-10-03 2014-12-31 14 2013 2014 Security(3490 [HCP])
25 Mary Barra GM GM 2014-01-15 2014-12-31 11 2014 2014 Security(40430 [GM])
52 Susan Cameron Reynolds American RAI 2014-05-01 2014-12-31 7 2014 2014 Security(20277 [RAI])
7 Susan N. Story American Water Works Company AWK 2014-05-09 2014-12-31 7 2014 2014 Security(36098 [AWK])
55 Barbara Rentler Ross Stores ROST 2014-06-02 2014-12-31 6 2014 2014 Security(6546 [ROST])

77 rows � 9 columns


Now I have a clean data set, with all of the data I need and hopefully nothing that will trip me up.

Understanding the Data One Company At A Time

My first goal was to see for each company, the historical price plotted on a graph, with markers where the female CEOs started and where they ended. The idea was to look for any trends and to see if I could determine anything interesting manually.

To do this, I needed all of the historical pricing data for each of these companies stored in a new dataframe.


In [ ]:
# I make a series out of just the SIDs. 
SIDs = CEOs.SID

# Then call get_pricing on the series of SIDs and store the results in a new dataframe called prices. 
prices = get_pricing(
    SIDs,     
    start_date='2002-01-01', 
    end_date='2014-12-31',
    fields ='close_price',
    handle_missing='ignore'
)

prices[0:3]

Next I need to get the pricing data and start and end dates plotted on a chart for an individual security.

I decided to do this one security at a time, both because I think this is a big use case for research users and I wanted to see how it was done, and because I thought it would help me to just get a feel for the data.

Ultimately, while this is interesting, educational and fun to see, it doesnt tell me much. Generally the market drop in 2008 is a huge factor.


In [ ]:
security = 128  #found this by hand 2351, 128, 6330, 3490, 24819
adm_df = CEOs[(CEOs['SID'] == security)]
sec_df = prices[security]
In [ ]:
fig = pyplot.figure()
ax2 = fig.add_subplot(212)
start_date = adm_df['start_date']
end_date = adm_df['end_date']
prices[security].plot(ax=ax2, figsize=(16, 15))

ax2.plot(start_date, prices.ix[start_date][security], '^', markersize=20, color='m')
ax2.plot(end_date, prices.ix[end_date][security], 'v', markersize=20, color='m')

pyplot.legend()
print start_date
print end_date

Writing an Outline of my Algo

Based on this research, I decided the best thing to do was write a simple algo, see if it was interesting and then iterate. I decided that the most simple algo I could write, would just buy some number of stocks the day a female CEO started her job, and sell them when she left the position. Just to prove to myself that I could, I outlined what that might look like.


In [ ]:
def buy_sell(row):
    todays_date = pd.to_datetime('2005-10-03')
    start_date = row['start_date']
    end_date = row['end_date']
    sid = row['SID']
    if start_date == todays_date:
        print ("Buy!")
        print sid
    elif end_date == todays_date:
        print ("Sell!")
        print sid
    return row
    
In [ ]:
CEOs = CEOs.apply(lambda row:buy_sell(row), axis=1)

Success! I can figure out the right date to buy and sell my securities! For a backtest, the code needs to be a little more complicated. Here I am manually setting a date, but for my backtest this will be fed into the algorithm.

Writing V1 of my Algo

Once I had the outline, I decided I try writing an algorithm in the research environment using zipline to test it's results. The first version of my algo was purposefully as simple as I could make it. Check the date, find any CEOs who started or ended their tenure on that date. If they started on a specific date, buy 500 shares, and if they ended their tenure on a particular date, sell all of the position.


In [330]:
# First I create a list of all the companies SIDs that I want to use. 

tickers_to_use = CEOs.SID
In [331]:
# I then get a datframe of the historial pricing information for those companies.

data = get_pricing(
    tickers_to_use,     
    start_date='2002-01-01', 
    end_date='2014-12-31',
    fields ='close_price',
    handle_missing='log'
                  )
In [332]:
#Pull out a couple of tickers that are having issues because of a known bug
CEOs = CEOs[(CEOs['Ticker'] != ('RDA'))]
CEOs = CEOs[(CEOs['Ticker'] != ('WEN'))]
CEOs = CEOs[(CEOs['Ticker'] != ('GAS'))]
In [333]:
"""
    This is where I initialize my algorithm
    
"""

def initialize_first(context):    

    #load the CEO data and a variable to count the number of stocks held at any time as global variables
    context.CEOs = CEOs
    context.stocks_held = 0
In [334]:
"""
    Handle data is the function that is running every minute (or day) looking to make trades
    
"""

def handle_data_first(context, data):

    # get todays date
    today = get_datetime()
        
    # get a dataframe with just the companies where start_date (or end date) is today
    start_today = context.CEOs[context.CEOs.start_date==today]
    end_today = context.CEOs[context.CEOs.end_date==today]
    
    
    # Iterrows then iterates through the rows of my start_today dataframe, for each row it
    #: 1. Creates a variable for the current SID and the current ticker
    #: 2. Determines if the SID is in our pricing data (ie. do we have pricing data for this SID for today) 
    #: 3. If it is, it increases stocks_held by 1 and buys 500 shares of that stock
        
    for idx, row in start_today.iterrows():
        current_sid = row['SID']
        current_ticker = row['Ticker']          
                
        if current_sid in data:
            #print 'On {} buy {}'.format(today, current_ticker)
            context.stocks_held = context.stocks_held + 1
            order_id = order_target(current_sid, 500)

            
    # We then do the same thing for my end_today dataframe to determine what we should sell            
    
    for idx, row in end_today.iterrows():
        current_sid = row['SID']
        current_ticker = row['Ticker']         
                
        if current_sid in data: 
            #print 'On {} sell {} of {}'.format(today, context.portfolio.positions[current_sid], current_ticker)
            
            context.stocks_held = context.stocks_held - 1
            order_id = order_target(current_sid, 0)       

                
In [335]:
"""
    Here's where we will instantiate the Trading Algorithm and run our simulation
"""

#: We tell zipline to run the algo using initialize and handle_data as our two functions
algo_obj = TradingAlgorithm(
    initialize=initialize_first,
    handle_data=handle_data_first
)

"""
    Plotting function for plotting our transactions and our long/short positions
"""

#: We then get the results from Zipline and plot them
def analyze(context, perf):
    
    fig = pyplot.figure()
    ax1 = fig.add_subplot(211)
    perf.portfolio_value.plot(ax = ax1, figsize=(14,12))
    ax1.set_ylabel('portfolio value in $', fontsize=20)
    
    perf_trans = perf.ix[[t != [] for t in perf.transactions]]
    buys = perf_trans.ix[[t[0]['amount'] > 0 for t in perf_trans.transactions]]
    sells = perf_trans.ix[[t[0]['amount'] < 0 for t in perf_trans.transactions]]

    ax1.plot(buys.index, perf.portfolio_value.ix[buys.index],
             '^', markersize=10, color='m')
    ax1.plot(sells.index, perf.portfolio_value.ix[sells.index],
             'v', markersize=10, color='k')
    
    pyplot.legend(loc=0)
    pyplot.show()

#: Custom Plotting Function
algo_obj._analyze = analyze

#: Run the simulation
perf_manual = algo_obj.run(data)
[2015-02-11 02:15] INFO: Performance: Simulated 3273 trading days out of 3273.
[2015-02-11 02:15] INFO: Performance: first open: 2002-01-02 14:31:00+00:00
[2015-02-11 02:15] INFO: Performance: last close: 2014-12-31 21:00:00+00:00

Wow! That looks pretty good! It goes up and to the right...and up and to the right by A LOT. It seems remarkable right?

Not so fast.....We really need to consider how the market performed during the same time period.

Comparing V1 Against a Benchmark

For the purposes of this exercise, I decided to use the S&P 500 as my benchmark. You could argue that there were other benchmarks that might be better suited, but this was the easiest. For a first pass, that seemed reasonable.


In [346]:
#: First we need to get the data of the S&P500, since this is going to be our benchmark.

data_SPY = get_pricing(['SPY'],
                       start_date='2002-01-01',
                       end_date='2015-02-10',
                       fields='close_price',
                       frequency='daily')
In [337]:
"""
    This cell creates an extremely simple handle_data that will keep 100% 
    of our portfolio in the SPY and I'll plot against the algorithm defined above.
"""

#: Here I'm defining the algo that I have above so I can run with a new graphing method
my_algo = TradingAlgorithm(
    initialize=initialize_first, 
    handle_data=handle_data_first
)

def bench_initialize(context):
    context.first_bar = True

#: Define a simple handle_data that will keep 100% in SPY
def bench_handle(context, data):
    if context.first_bar:
        order_target_percent(8554, 1)
        context.first_bar = False
    else:
        pass

#: Define the algo that will run as the benchmark
bench_algo = TradingAlgorithm(
    initialize=bench_initialize,
    handle_data=bench_handle
)

#: Create a figure to plot on the same graph
fig = pyplot.figure()
ax1 = fig.add_subplot(211)

#: Create our plotting algorithm
def my_algo_analyze(context, perf):
    perf.portfolio_value.plot(ax = ax1, label="My Algo")
def bench_algo_analyze(context, perf):
    perf.portfolio_value.plot(ax = ax1, label="Benchmark")

#: Insert our analyze methods
my_algo._analyze = my_algo_analyze 
bench_algo._analyze = bench_algo_analyze

# Run algorithms
my_algo.run(data)
bench_algo.run(data_SPY)

#: Plot the graph
ax1.set_ylabel('portfolio value in $', fontsize=20)
ax1.set_title("Cumulative Return", fontsize=20)
ax1.legend(loc='best')
fig.tight_layout()
pyplot.show()
[2015-02-11 02:17] INFO: Performance: Simulated 3273 trading days out of 3273.
[2015-02-11 02:17] INFO: Performance: first open: 2002-01-02 14:31:00+00:00
[2015-02-11 02:17] INFO: Performance: last close: 2014-12-31 21:00:00+00:00
[2015-02-11 02:18] INFO: Performance: Simulated 3227 trading days out of 3227.
[2015-02-11 02:18] INFO: Performance: first open: 2002-01-02 14:31:00+00:00
[2015-02-11 02:18] INFO: Performance: last close: 2014-10-24 20:00:00+00:00

Would you look at that? My algo is beating the S&P 500!

I feel pretty amazing at this point. I know little to nothing about the market, but I'm beating it by investing in women.

Then I chat with a couple of professional quants. I explain the basics of what I am doing and learn that buying a set number of shares really isn't what a professional would do.

Trying Another Approach at Ordering

I need to modify my algo so that it rebalances based on the number of stocks in my portfolio. When I own one stock, it should be 100% of my portfolio. When I own two stocks, they should each be 50% of my portfolio. As the number of stocks in my portfolio changes, the target weight of each stock should change too.

I also learned about slippage, and needed to add some protection for that at this point.


In [338]:
"""
    This is where I initialize my algorithm
    
"""

from zipline.api import order
from zipline.finance.slippage import FixedSlippage


def initialize(context):    
    #load the CEO data and a variable to count the number of stocks held at any time as global variables
    
    context.CEOs = CEOs
    context.current_stocks = []
    context.stocks_to_order_today = []
    context.stocks_to_sell_today = []
    context.set_slippage(FixedSlippage(spread=0))
    
In [339]:
"""
    Handle data is the function that is running every minute (or day) looking to make trades
"""
from zipline.api import order

def handle_data(context, data):
    #: Set my order and sell dictionaries to empty at the start of any day. 
    context.stocks_to_order_today = []
    context.stocks_to_sell_today = []

    # Get todays date.
    today = get_datetime()
        
    # Get a dataframe with just the companies where start_date (or end date) is today.
    context.stocks_to_order_today = context.CEOs.SID[context.CEOs.start_date==today].tolist()
    context.stocks_to_sell_today= context.CEOs.SID[context.CEOs.end_date==today].tolist()
    context.stocks_to_sell_today = [s for s in context.stocks_to_sell_today if s!= None]
    context.stocks_to_order_today = [s for s in context.stocks_to_order_today if s!= None]
    
    # If there are stocks that need to be bought or sold today
    if len(context.stocks_to_order_today) > 0 or len(context.stocks_to_sell_today) > 0:
#         print "-----------------------------------"
#         print today
#         print "cash = %s" % context.portfolio.cash
#         print "current stocks = %s" % len(context.current_stocks)
#         print "stocks to sell = %s" % len(context.stocks_to_sell_today) 
#         if len(context.stocks_to_sell_today) > 0:
#             print "Stocks to sell %s" % context.stocks_to_sell_today
#         print "stocks to buy = %s" % len(context.stocks_to_order_today) 
#         if len(context.stocks_to_order_today) > 0:
#             print "Stocks to order today %s" % context.stocks_to_order_today
        
        
        # First sell any that need to be sold, and remove them from current_stocks. 
        for stock in context.stocks_to_sell_today: 
            if stock in data:
                if stock in context.current_stocks:
                    order_target(stock,0)
                    context.current_stocks.remove(stock) 
                    #print "Selling %s" % stock
        
        # Then add any I am buying to current_stocks.
        for stock in context.stocks_to_order_today: 
            context.current_stocks.append(stock) 
        
        # Then rebalance the portfolio so I have and equal amount of each stock in current_stocks.
        for stock in context.current_stocks: 
            if stock in data: 
                #print "Buying and/or rebalancing %s at target weight %s" % (stock, target_weight)

                #calculate the value to buy
                portfolio_value = context.portfolio.portfolio_value
                num_stocks = len(context.current_stocks)
                value_to_buy = portfolio_value/num_stocks
                
                #print "Buying and/or rebalancing %s at value = %s" % (stock, value_to_buy)
                order_target_value(stock,value_to_buy)
                
        
                        
In [347]:
"""
    This cell will create an extremely simple handle_data that will keep 100% 
    of our portfolio into the SPY and I'll plot against the algorithm defined above.
"""

#: Here I'm defining the algo that I have above so I can run with a new graphing method
my_algo = TradingAlgorithm(
    initialize=initialize, 
    handle_data=handle_data
)

def bench_initialize(context):
    context.first_bar = True

#: Define a simple handle_data that will keep 100% in SPY
def bench_handle(context, data):
    if context.first_bar:
        order_target_percent(8554, 1)
        context.first_bar = False
    else:
        pass

#: Define the algo that will run as the benchmark
bench_algo = TradingAlgorithm(
    initialize=bench_initialize,
    handle_data=bench_handle
)

#: Create a figure to plot on the same graph
fig = pyplot.figure()
ax1 = fig.add_subplot(211)

#: Create our plotting algorithm
def my_algo_analyze(context, perf):
    perf.portfolio_value.plot(ax = ax1, label="Fortune 1000 Companies with Female CEOs")
def bench_algo_analyze(context, perf):
    perf.portfolio_value.plot(ax = ax1, label="Benchmark")

#: Insert our analyze methods
my_algo._analyze = my_algo_analyze 
bench_algo._analyze = bench_algo_analyze

# Run algorithms
returns = my_algo.run(data)
bench_algo.run(data_SPY)

#: Plot the graph
ax1.set_ylabel('portfolio value in $', fontsize=20)
ax1.set_title("Cumulative Return", fontsize=20)
ax1.legend(loc='best')
fig.tight_layout()
pyplot.show()
[2015-02-11 14:30] INFO: Performance: Simulated 3273 trading days out of 3273.
[2015-02-11 14:30] INFO: Performance: first open: 2002-01-02 14:31:00+00:00
[2015-02-11 14:30] INFO: Performance: last close: 2014-12-31 21:00:00+00:00
[2015-02-11 14:31] INFO: Performance: Simulated 3298 trading days out of 3300.
[2015-02-11 14:31] INFO: Performance: first open: 2002-01-02 14:31:00+00:00
[2015-02-11 14:31] INFO: Performance: last close: 2015-02-10 21:00:00+00:00

This just keeps getting better. It's really almost too good to be true.

Let's do a couple of quick checks to ensure that I am buying and selling securities and that my leverage isn't out of control.


In [341]:
#Look at the number of positions over time. 

def find_rows(row):
    positions = [pos for pos in row['positions'] if pos['amount'] > 0]
    row['position_len'] = len(positions)
    return row

returns['position_len']=0
returns = returns.apply(lambda row: find_rows(row), axis=1)

returns.position_len.plot()
Out[341]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa299255ad0>
In [342]:
# Look at the leverage over time. 
returns.gross_leverage.plot()
Out[342]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa2a036f990>

Understanding Why?

Now that I have an algo that looks so amazing, I am trying to understand why.

One suggestion, was to look for similarities in the companies that I have invested in besides the gender of their CEO, such as their sector.

I pulled and plot the Morningstar sector codes for each company.


In [343]:
sectors = local_csv('CEOs_sector_output.csv')
sector_count = sectors['Sector'].value_counts(sort=False)
sector_count.plot(kind='bar')
Out[343]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa29a6e0c50>

It does look like I have a slight bias towards consumer cyclical companies. These include companies such as, GM, eBay, The New York Times and Ann Taylor Stores.

The next question might be to ask, "Is my sector weighting responsible for the performance?" Using XLY, a consumer discretionary ETF, we can get a comparison of how consumer companies are doing against the S&P500 for the same time period.


In [344]:
consumer = get_pricing(['XLY','SPY'],
           start_date = '2002-01-02', 
           end_date = '2015-02-01', 
           fields = 'close_price')

def cum_returns(df):
        return (1 + df).cumprod() - 1
        
cum_returns(consumer.pct_change()).plot()
Out[344]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa298e0e0d0>

It looks like consumer discretionary companies have done well in the last 12 years. It's possible that this is part of the success of this strategy. However, there are also a number of female CEOs in the Industrials (Lockheed Martin, General Dynamics, etc) Technology (Yahoo, Xerox, IBM, HP, etc) and Utilities (American Water Works, Portland General Electric, Alliant Energy) as well.

Next Steps

Develping a 'sector neutral' version of this algo would be a good next step. One way to do this, would be to force my portfolio to have an equal weight in each sector, as opposed to an equal weight in each stock. Doing this would help me determine if there is sector bias in this approach and to plan for different market shifts in the future.

Another next step would be finding the right benchmark. The S&P 500 is a fine place to start, but some better options might be,

  • All the other companies in the Fortune 1000
  • The Fortune 1000 as a whole
  • Something highly correlated to the Fortune 1000

Additionally, in order to live trade this algorithm, and updating data feed would be needed. Something that updates when CEOs change and includes the gender. If I can find this data set, on a broader subset of stocks, I will absolutely look to evalute this outside of the Fortune 1000 in the interest of developing an algo I can live trade.