Notebook

Investing in Women-Led Companies

It has been widely reported that companies with women in senior management and on the board of directors perform better than companies without. Credit Suisse’s Gender 3000 report looks at gender diversity in 3000 companies across 40 countries. According to this report, at the end of 2013, women accounted for 12.9% of top management (CEOs and directors reporting to the CEO) and 12.7% of boards had gender diversity. Additionally, “Companies with more than one woman on the board have returned a compound 3.7% a year over those that have none since 2005.”

These kind of reports quickly lead to the question, “What would happen if you invested in companies with female CEOs?”

Earlier this year I shared two different versions of my notebook (here (https://www.quantopian.com/posts/research-investing-in-women-led-fortune-1000-companies) and here (https://www.quantopian.com/posts/research-an-update-to-investing-in-women-led-companies)). This third version shows the ways in which I have continued to explore this strategy.

Play with this

This notebook is entirely clonable, and all of my data is available to you here (https://www.dropbox.com/sh/qb0qjhzhbbmoaxq/AACiXyN25R0QKg7Js6T2IDbra?dl=0) please feel free to try it out on your own and share other versions back.

The Data

The data backing this research was provided by Catalyst’s (http://www.catalyst.org/) Bottom Line Research Project (http://www.catalyst.org/knowledge/bottom-line-0).

In [15]:
#Import the libraries needed for the analysis.

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as pyplot
import pytz
from pytz import timezone
from zipline import TradingAlgorithm
from zipline.api import (order_target_percent, record, symbol, history, add_history, get_datetime, 
                        get_open_orders, get_order, order_target_value, order, order_target)
from zipline.finance.slippage import FixedSlippage



#Import my csv and rename some of the columns 
CEOs = local_csv('FemaleCEOs_v6.csv')
CEOs.rename(columns={'SID':'Ticker', 'Start Date':'start_date', 'End Date':'end_date'}, inplace=True)

#Below you see some basic information and the first 10 rows of this dataframe. 
print "Number of CEOs = %s" % len(CEOs)
print "Number of Companies = %s" % CEOs['Ticker'].nunique()
CEOs
Number of CEOs = 80
Number of Companies = 74
Out[15]:
CEO Company Name Ticker start_date end_date
0 Patricia A Woertz Archer Daniels Midland Company (ADM) ADM 5/1/06 12/31/14
1 Patricia Russo Lucent ALU 12/1/06 8/1/08
2 Katherine Krill AnnTaylor Stores Corporation ANN 10/1/05 12/31/14
3 Angela Braly WellPoint ANTM 6/1/07 8/1/12
4 Judy McReynolds Arkansas Best Corp. ARCB 1/2/10 12/31/14
5 Andrea Jung Avon Product AVP 11/4/99 4/22/12
6 Sheri McCoy Avon Product AVP 4/23/12 12/31/14
7 Susan N. Story American Water Works Company AWK 5/9/14 12/31/14
8 Gayla Delly Benchmark Electronics BHE 1/3/12 12/31/14
9 Elizabeth Smith Bloomin' Brands BLMN 8/9/12 12/31/14
10 Diane M. Sullivan Brown Shoe Company CAL 5/26/11 12/31/14
11 Sandra Cochran Cracker Barrel Old Country Store CBRL 9/12/11 12/31/14
12 Linda Massman Clearwater Paper CLW 1/2/13 12/31/14
13 Denise Morrison Campbell Soup CPB 8/1/11 12/31/14
14 Andrea Ayers Convergys CVG 10/2/12 12/31/14
15 Ellen Kullman DuPont DD 1/2/09 12/31/14
16 Sara Mathew Dun & Bradstreet Inc. DNB 1/2/10 10/7/13
17 Lynn Good Duke Energy DUK 7/1/13 12/31/14
18 Margaret Whitman eBay EBAY 1/1/98 3/1/08
19 Mary Agnes Wilderotter Citizens Communications FTR 11/1/04 12/31/14
20 Mary Agnes Wilderotter Frontier Communications FTR 1/3/06 12/31/14
21 Paula G. Rosput AGL Resources Inc. GAS 8/1/00 1/3/06
22 Gracia Martore Gannett GCI 10/6/11 12/31/14
23 Phebe Novakovic General Dynamics GD 1/2/13 12/31/14
24 S. Marce Fuller Genon Energy GEN 7/1/99 9/30/05
25 Mary Barra GM GM 1/15/14 12/31/14
26 Lauralee Martin HCP HCP 10/3/13 12/31/14
27 Constance H. Lau Hawaiian Electric Industries Inc. HE 5/2/06 12/31/14
28 Meg Whitman HP HPQ 9/22/11 12/31/14
29 Mindy F. Grossman HSN HSNI 8/20/08 12/31/14
... ... ... ... ... ...
50 Mary Sammons Rite Aid Corp RAD 6/25/03 6/23/10
51 Susan Ivey Reynolds American RAI 6/26/05 2/1/11
52 Susan Cameron Reynolds American RAI 5/1/14 12/31/14
53 Mary Berner Reader's Digest Association RDA 11/10/10 4/25/11
54 Amy Miles Regal Entertainment RGC 6/30/09 1/21/14
55 Barbara Rentler Ross Stores ROST 6/1/14 12/31/14
56 Claire Babrowski Radio Shack RSH 2/1/06 7/5/06
57 Tamara L. Lundgren Schnitzer Steel Industries SCHN 12/1/08 12/31/14
58 Cinda Hallman Spherion SFN 4/1/01 4/1/04
59 Debra Reed Sempra Energy SRE 6/27/11 12/31/14
60 Lynn Laverty Elsenhans Sunoco SUN 8/8/08 3/1/12
61 Carol Meyrowitz TJX Corp TJX 1/28/07 12/31/14
62 Sheryl Palmer Taylor Morrison Home TMHC 4/20/13 12/31/14
63 Mary Dillon Ulta Salon Cosmetics & Fragrance ULTA 7/1/13 12/31/14
64 Kimberly Bowers CST Brands VLO 1/2/13 12/31/14
65 Debra Cafaro Ventas VTR 3/5/99 12/31/14
66 Kerrii B. Anderson Wendy's International WEN 11/9/06 9/2/08
67 Laura J. Alber Williams-Sonoma WSM 5/26/10 12/31/14
68 Christina Gold Western Union Holdings WU 10/2/06 9/1/10
69 Anne Mulcahy Xerox Corp XRX 8/1/01 7/1/09
70 Ursula Burns Xerox Corp XRX 7/1/09 12/31/14
71 Gretchen McClain Xylem XYL 10/22/11 9/9/13
72 Carol Bartz Yahoo YHOO 1/13/09 9/9/11
73 Marissa Mayer Yahoo YHOO 7/17/12 12/31/14
74 Mary Forte Zale ZLC 8/1/02 1/30/06
75 Mary Burton Zale ZLC 1/30/06 12/1/07
76 Carleton Fiorina HP HP 7/1/99 2/8/05
77 Paula Rosput Reynolds Safeco SAF 1/3/06 9/2/08
78 Stephanie Streeter Banta BN 10/1/02 1/3/07
79 Dorrit J Bern Charming Shoppes CHRS 2/4/02 7/9/08

80 rows × 5 columns

Scrubbing the Data

In [16]:
# First I need to convert the date values in the csv to datetime objects in UTC timezone. 

CEOs['start_date'] = CEOs['start_date'].apply(lambda row: pd.to_datetime(str(row), utc=True))
CEOs['end_date'] = CEOs['end_date'].apply(lambda row: pd.to_datetime(str(row), utc=True))
In [17]:
# Then I want to check if any of the dates are weekends. 
# If they are a weekend, I move them to the following Monday. 

def check_date(row):
    week_day = row.isoweekday()
    if week_day == 6:
        row = row + timedelta(days=2)
    elif week_day == 7:
        row = row + timedelta(days=1)
    return row

CEOs['start_date'] = CEOs['start_date'].apply(check_date)
CEOs['end_date'] = CEOs['end_date'].apply(check_date)
In [18]:
# We need to deal with the dates that are outside of our pricing data range
# For people that started prior to 01/02/2002, I have changes their start date to 01/02/2002
# I also changed any future dated end dates to 12/1/2014, just to be safe. 

def change_date(row): 
    start_date = row['start_date']
    end_date = row['end_date']
    
    if start_date < pd.to_datetime("2002-01-02", utc=True):
        row['start_date'] = pd.to_datetime("2002-01-02", utc=True)
    elif end_date > pd.to_datetime("2015-01-01", utc=True):
        row['end_date'] = pd.to_datetime("2014-12-01", utc=True)
    return row

CEOs = CEOs.apply(change_date, axis=1)
In [21]:
# I then add a new row called SID, which is the Security Identifier.
# Since ticker symboles are not unique across all time, the SID ensures we have the right company. 
# I use the ticker and the start date to search for the security object

def get_SID(row):
    temp_ticker = row['Ticker']
    start_date = row['start_date'].tz_localize('UTC')

    row['SID'] = symbols(temp_ticker, start_date)
    return row

CEOs = CEOs.apply(get_SID, axis=1)
#CEOs.sort(columns='Ticker')

My Algo - a refresher

The algo as written below, buys when the CEO comes into the postion and sells when she leave. It rebalances based on the number of stocks in my portfolio. When I own one stock, it will be 100% of my portfolio. When I own two stocks, they will each be 50% of my portfolio. As the number of stocks in my portfolio changes, the target weight of each stock should change too.

Alt text

In [10]:
"""
    This is where I initialize my algorithm
    
"""

from zipline.api import order
from zipline.finance.slippage import FixedSlippage


def initialize(context):    
    #load the CEO data and a variable to count the number of stocks held at any time as global variables
    
    context.CEOs = CEOs
    context.current_stocks = []
    context.stocks_to_order_today = []
    context.stocks_to_sell_today = []
    context.set_slippage(FixedSlippage(spread=0))
    
In [11]:
"""
    Handle data is the function that is running every minute (or day) looking to make trades
"""
from zipline.api import order

def handle_data(context, data):
    #: Set my order and sell dictionaries to empty at the start of any day. 
    context.stocks_to_order_today = []
    context.stocks_to_sell_today = []

    # Get todays date.
    today = get_datetime()
        
    # Get a dataframe with just the companies where start_date (or end date) is today.
    context.stocks_to_order_today = context.CEOs.SID[context.CEOs.start_date==today].tolist()
    context.stocks_to_sell_today= context.CEOs.SID[context.CEOs.end_date==today].tolist()
    context.stocks_to_sell_today = [s for s in context.stocks_to_sell_today if s!= None]
    context.stocks_to_order_today = [s for s in context.stocks_to_order_today if s!= None]
    
    # If there are stocks that need to be bought or sold today
    if len(context.stocks_to_order_today) > 0 or len(context.stocks_to_sell_today) > 0:
        
        # First sell any that need to be sold, and remove them from current_stocks. 
        for stock in context.stocks_to_sell_today: 
            if stock in data:
                if stock in context.current_stocks:
                    order_target(stock,0)
                    context.current_stocks.remove(stock) 
                    #print "Selling %s" % stock
        
        # Then add any I am buying to current_stocks.
        for stock in context.stocks_to_order_today: 
            context.current_stocks.append(stock) 
        
        # Then rebalance the portfolio so I have and equal amount of each stock in current_stocks.
        for stock in context.current_stocks: 
            if stock in data: 
                #print "Buying and/or rebalancing %s at target weight %s" % (stock, target_weight)

                #calculate the value to buy
                portfolio_value = context.portfolio.portfolio_value
                num_stocks = len(context.current_stocks)
                value_to_buy = portfolio_value/num_stocks
                
                #print "Buying and/or rebalancing %s at value = %s" % (stock, value_to_buy)
                order_target_value(stock,value_to_buy)            
                        
In [12]:
"""
    This cell will create an extremely simple handle_data that will keep 100% 
    of our portfolio into the SPY and I'll plot against the algorithm defined above.
"""
# I set the start and end date I want my algo to run for
start_algo = '2002-01-01'
end_algo = '2014-12-31'

# I make a series out of just the SIDs. 
SIDs = CEOs.SID

# Then call get_pricing on the series of SIDs and store the results in a new dataframe called prices. 
data = get_pricing(
    SIDs,     
    start_date= start_algo, 
    end_date= end_algo,
    fields ='close_price',
    handle_missing='ignore'
)

#: Here I'm defining the algo that I have above so I can run with a new graphing method
my_algo = TradingAlgorithm(
    initialize=initialize, 
    handle_data=handle_data
)

#: Create a figure to plot on the same graph
fig = pyplot.figure()
ax1 = fig.add_subplot(211)

#: Create our plotting algorithm
def my_algo_analyze(context, perf):
    perf.portfolio_value.plot(ax = ax1, label="Fortune 1000 Women-Led Companies")

#: Insert our analyze methods
my_algo._analyze = my_algo_analyze 

# Run algorithms
returns = my_algo.run(data)

#: Plot the graph
ax1.set_ylabel('portfolio value in $', fontsize=20)
ax1.set_title("Cumulative Return", fontsize=20)
ax1.legend(loc='best')
fig.tight_layout()
pyplot.show()
[2015-05-30 15:53] INFO: Performance: Simulated 3273 trading days out of 3273.
[2015-05-30 15:53] INFO: Performance: first open: 2002-01-02 14:31:00+00:00
[2015-05-30 15:53] INFO: Performance: last close: 2014-12-31 21:00:00+00:00

Benchmarks

To get a benchmark, I'm using a function, get_backtest, which pulls all of the results of a backtest in from the Quantopian IDE. In this case, my algorithm does nothing, other than set a benchmark. This allows me to get a benchmark where all the work has already been done to optimize the benchmark.


In [31]:
benchmark_bt = get_backtest('54ef94a65457f30f0b4db137')
100% Time: 0:00:05|###########################################################|

I plot the cumulative returns of this benchmark against those of my algo to see how the relative performance is.


In [14]:
#: Create a figure to plot on the same graph
fig = pyplot.figure()
ax1 = fig.add_subplot(211)

#: Plot the graph
benchmark_bt.risk.benchmark_period_return.plot(label="SPY")
my_algo.perf_tracker.cumulative_risk_metrics.algorithm_cumulative_returns.plot(label="Fortune 1000 Women-Led Companies")
ax1.set_ylabel('% Cumulative Return', fontsize=20)
ax1.set_title("Cumulative Return", fontsize=20)
ax1.legend(loc='best')
fig.tight_layout()
pyplot.show()

Returns

I calculate the returns for the benchmark and my algo, and then to difference between them.


In [15]:
bench_tot_return = benchmark_bt.risk.benchmark_period_return.iloc[-1]
algo_tot_return = my_algo.perf_tracker.cumulative_risk_metrics.algorithm_cumulative_returns.iloc[-1]

bench_pct_ret = bench_tot_return * 100
algo_pct_ret = algo_tot_return * 100
bench_algo_diff = (algo_tot_return - bench_tot_return) * 100

print "Algo Percent Returns %s" % algo_pct_ret
print "Benchmark Percent Returns %s" % bench_pct_ret
print "Difference %s" % bench_algo_diff
Algo Percent Returns 339.9930954
Benchmark Percent Returns 122.305952214
Difference 217.687143186

Yahoo & Alibaba

In September of 2014, Alibaba IPO'd in the largest IPO in history. Yahoo had a sizeable investment in the company, and saw a nice bump in their stock price.

A couple of people have asked, "What if you remove Yahoo and Alibaba? Is this all due to the incredible performance there?

Here you can see the historical price of Yahoo, and the great performance since late 2013.

In [24]:
fig = pyplot.figure()
ax2 = fig.add_subplot(212)
start_date = adm_df['start_date']
end_date = adm_df['end_date']
data[security].plot(ax=ax2, figsize=(16, 15), color='g')

ax2.plot(start_date, data.ix[start_date][security], '^', markersize=20, color='b', linestyle='')
ax2.plot(end_date, data.ix[end_date][security], 'v', markersize=20, color='b', linestyle='')

pyplot.legend()
Out[24]:
<matplotlib.legend.Legend at 0x7f7bf67b9d10>
In [22]:
#Remove Yahoo
CEOs_yhoo = CEOs[(CEOs['Ticker'] != ('YHOO'))]
In [28]:
"""
    This cell will create an extremely simple handle_data that will keep 100% 
    of our portfolio into the SPY and I'll plot against the algorithm defined above.
"""
# I set the start and end date I want my algo to run for
start_algo = '2002-01-01'
end_algo = '2014-12-31'

# I make a series out of just the SIDs. 
SIDs = CEOs_yhoo.SID

# Then call get_pricing on the series of SIDs and store the results in a new dataframe called prices. 
data = get_pricing(
    SIDs,     
    start_date= start_algo, 
    end_date= end_algo,
    fields ='close_price',
    handle_missing='ignore'
)

#: Here I'm defining the algo that I have above so I can run with a new graphing method
my_algo_yhoo = TradingAlgorithm(
    initialize=initialize, 
    handle_data=handle_data
)

#: Insert our analyze methods
my_algo_yhoo._analyze = my_algo_analyze 

# Run algorithms
returns_yhoo = my_algo_yhoo.run(data)
[2015-05-30 16:06] INFO: Performance: Simulated 3273 trading days out of 3273.
[2015-05-30 16:06] INFO: Performance: first open: 2002-01-02 14:31:00+00:00
[2015-05-30 16:06] INFO: Performance: last close: 2014-12-31 21:00:00+00:00
In [29]:
#: Create a figure to plot on the same graph
fig = pyplot.figure()
ax1 = fig.add_subplot(211)

#: Plot the graph
benchmark_bt.risk.benchmark_period_return.plot(label="SPY")
my_algo_yhoo.perf_tracker.cumulative_risk_metrics.algorithm_cumulative_returns.plot(label="Fortune 1000 Women-Led Companies")
ax1.set_ylabel('% Cumulative Return', fontsize=20)
ax1.set_title("Cumulative Return", fontsize=20)
ax1.legend(loc='best')
fig.tight_layout()
pyplot.show()
In [30]:
bench_tot_return = benchmark_bt.risk.benchmark_period_return.iloc[-1]
algo_tot_return = my_algo_yhoo.perf_tracker.cumulative_risk_metrics.algorithm_cumulative_returns.iloc[-1]

bench_pct_ret = bench_tot_return * 100
algo_pct_ret = algo_tot_return * 100
bench_algo_diff = (algo_tot_return - bench_tot_return) * 100

print "Algo Percent Returns %s" % algo_pct_ret
print "Benchmark Percent Returns %s" % bench_pct_ret
print "Difference %s" % bench_algo_diff
Algo Percent Returns 320.0392926
Benchmark Percent Returns 122.305952214
Difference 197.733340386

From this we can see that removing Yahoo did have some impact on the algo's performance, but wasn't responsible for the outsized returns.

The reason is that at this point in the algo, there are approximately 45 companies in the portfolio, and so Yahoo represents only 1/45th of my total investment. The chart below shows the number of companies I hold during each year the algo is running.

How many companies do I hold each year?

In [23]:
from pandas.tseries.offsets import YearBegin
CEOs['year_ended'] = pd.DatetimeIndex(CEOs['end_date']).year
CEOs['year_started'] = pd.DatetimeIndex(CEOs['start_date']).year


counts = pd.Series(index=pd.date_range('2002-01-01', '2015-01-01', freq=YearBegin(1)))
for year in counts.index:
    counts[year] = len(CEOs[(CEOs.start_date <= year) & (CEOs.end_date >= year)])
    
counts.plot(kind = 'bar') 
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f4a8e8d5b10>

Remove the outliers

Someone else asked me to remove the top and bottom outliers. Here I remove the top 3 and the bottom 3.

In [41]:
# Remove the top 3
CEOs_outliers = CEOs[(CEOs['Ticker'] != ('HSNI'))]
CEOs_outliers = CEOs_outliers[(CEOs_outliers['Ticker'] != ('VTR'))]
CEOs_outliers = CEOs_outliers[(CEOs_outliers['Ticker'] != ('TJX'))]

# Remove the bottom 3
CEOs_outliers = CEOs_outliers[(CEOs_outliers['Ticker'] != ('NYT'))]
CEOs_outliers = CEOs_outliers[(CEOs_outliers['Ticker'] != ('RAD'))]
CEOs_outliers = CEOs_outliers[(CEOs_outliers['Ticker'] != ('Q'))]
In [43]:
"""
    This cell will create an extremely simple handle_data that will keep 100% 
    of our portfolio into the SPY and I'll plot against the algorithm defined above.
"""
# I set the start and end date I want my algo to run for
start_algo = '2002-01-01'
end_algo = '2014-12-31'

# I make a series out of just the SIDs. 
SIDs = CEOs_outliers.SID

# Then call get_pricing on the series of SIDs and store the results in a new dataframe called prices. 
data = get_pricing(
    SIDs,     
    start_date= start_algo, 
    end_date= end_algo,
    fields ='close_price',
    handle_missing='ignore'
)

#: Here I'm defining the algo that I have above so I can run with a new graphing method
my_algo_outliers = TradingAlgorithm(
    initialize=initialize, 
    handle_data=handle_data
)

#: Insert our analyze methods
my_algo_outliers._analyze = my_algo_analyze 

# Run algorithms
returns_outliers = my_algo_outliers.run(data)
[2015-05-30 18:39] INFO: Performance: Simulated 3273 trading days out of 3273.
[2015-05-30 18:39] INFO: Performance: first open: 2002-01-02 14:31:00+00:00
[2015-05-30 18:39] INFO: Performance: last close: 2014-12-31 21:00:00+00:00
In [44]:
#: Create a figure to plot on the same graph
fig = pyplot.figure()
ax1 = fig.add_subplot(211)

#: Plot the graph
benchmark_bt.risk.benchmark_period_return.plot(label="SPY")
my_algo_outliers.perf_tracker.cumulative_risk_metrics.algorithm_cumulative_returns.plot(label="Fortune 1000 Women-Led Companies")
ax1.set_ylabel('% Cumulative Return', fontsize=20)
ax1.set_title("Cumulative Return", fontsize=20)
ax1.legend(loc='best')
fig.tight_layout()
pyplot.show()
In [45]:
bench_tot_return = benchmark_bt.risk.benchmark_period_return.iloc[-1]
algo_tot_return = my_algo_outliers.perf_tracker.cumulative_risk_metrics.algorithm_cumulative_returns.iloc[-1]

bench_pct_ret = bench_tot_return * 100
algo_pct_ret = algo_tot_return * 100
bench_algo_diff = (algo_tot_return - bench_tot_return) * 100

print "Algo Percent Returns %s" % algo_pct_ret
print "Benchmark Percent Returns %s" % bench_pct_ret
print "Difference %s" % bench_algo_diff
Algo Percent Returns 267.6509217
Benchmark Percent Returns 122.305952214
Difference 145.344969486

Again we can see here that removing the top and bottom performers does impact the performance, but not enough to be the entire reason for the performance.


Sectors Analysis

From the beginning (in fact included in the first post of this algo) I knew that the sector weighting of the algorithm was a bit suspect. As you can see from the chart below, almost 30% of the companies in my portfolio were in the consumer space.

In [16]:
sectors = local_csv('CEOs_sector_output_v2.csv')
sector_count = sectors['sector'].value_counts(sort=False)
sector_count.plot(kind='bar')
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f0d07c13a10>

The next question I asked was, "Is my sector weighting responsible for the performance?"

Using XLY, a consumer discretionary ETF, we can get a comparison of how consumer companies are doing against the S&P500 for the same time period.

From this we can see that the consumer market has outperformed the market during this same time period.


In [19]:
consumer = get_pricing(['XLY','SPY'],
           start_date = '2002-01-02', 
           end_date = '2015-02-01', 
           fields = 'close_price')

def cum_returns(df):
        return (1 + df).cumprod() - 1
        
cum_returns(consumer.pct_change()).plot()
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f0d0792ffd0>

Sector Neutral Algo

This sector neutral version of the algo, attempts to remove the bais towards consumer companies that the original algo has. It does this by first determining the number of sectors that the portfolio holds each time it is rebalanced, and then dividing the portfolio value by the number of sectors. It then determines the number of companies per sector, and divides the portfolio value per sector by the number of companies in the give sector.

This ensures that all sectors are invested in equally.

In [26]:
sectors_data = local_csv('CEOs_sector_output_v2.csv')

def get_sec_SID(row):
    temp_sid = row['SID']
    row['SID'] = symbols(temp_sid)
    return row

sectors_data = sectors_data.apply(get_sec_SID, axis=1)

CEOs = pd.merge(CEOs, sectors_data, how='left')
In [27]:
"""
    This is where I initialize my algorithm
    
"""

from zipline.api import order
from zipline.finance.slippage import FixedSlippage


def initialize(context):    
    #load the CEO data and a variable to count the number of stocks held at any time as global variables
    
    context.CEOs = CEOs
    context.current_stocks = []
    context.stocks_to_order_today = []
    context.stocks_to_sell_today = []
    context.set_slippage(FixedSlippage(spread=0))
    context.num_sectors = 0
In [28]:
"""
    Handle data is the function that is running every minute (or day) looking to make trades
"""
from zipline.api import order

def handle_data(context, data):
    #: Set my order and sell dictionaries to empty at the start of any day. 
    context.stocks_to_order_today = []
    context.stocks_to_sell_today = []
    current_CEOs = context.CEOs

    # Get todays date.
    today = get_datetime()
        
    # Get a dataframe with just the companies where start_date (or end date) is today.
    context.stocks_to_order_today = context.CEOs.SID[context.CEOs.start_date==today].tolist()
    context.stocks_to_sell_today= context.CEOs.SID[context.CEOs.end_date==today].tolist()
    context.stocks_to_sell_today = [s for s in context.stocks_to_sell_today if s!= None]
    context.stocks_to_order_today = [s for s in context.stocks_to_order_today if s!= None]
    
    
    
    # If there are stocks that need to be bought or sold today
    if (len(context.stocks_to_order_today) > 0) or (len(context.stocks_to_sell_today) > 0):
        #print "----------- today is %s --------- " % today
        #print "number of companies to buy = %s" % len(context.stocks_to_order_today)
        #print "number of companies to sell = %s" % len(context.stocks_to_sell_today)        
        
        # First sell any that need to be sold, and remove them from current_stocks. 
        for stock in context.stocks_to_sell_today: 
            if stock in data:
                if stock in context.current_stocks:
                    order_target(stock,0)
                    context.current_stocks.remove(stock) 
                    #print "Selling %s" % stock
        
        # Then add any I am buying to current_stocks.
        for stock in context.stocks_to_order_today: 
            context.current_stocks.append(stock)

        #get the list of current CEOs so that we can find the sector information
        current_CEOs = context.CEOs[context.CEOs.SID.isin(context.current_stocks)]
                        
        #count the number of sectors
        context.num_sectors = current_CEOs.sector_id.nunique()
        #print "*******************"
        #print "number of sectors = %s" % context.num_sectors
        
        #calculate the value to buy
        #get the current portfolio value
        portfolio_value = context.portfolio.portfolio_value
                
        #get the value to be invested in each sector
        value_per_sector = portfolio_value/context.num_sectors
        
        #series of sectors and the number of companies in the sector
        sector_count = current_CEOs['SID'].groupby(current_CEOs['sector_id']).count()
            
        # Then rebalance the portfolio so I have and equal amount of each stock in current_stocks.
        for stock in context.current_stocks: 
            if stock in data: 
                #print "++++++++++++++++++++++++++"
                #print "Buying and/or rebalancing %s" % (stock)
                
                #get the sector of the current company
                current_company_sector = context.CEOs.sector_id[(context.CEOs.SID == stock)].iloc[0]
                #print "current company sector %s" % current_co_sector  
                
                #num_comp_in_sector = current_CEOs.sector_id[current_CEOs.sector_id == current_co_sector].count()
                num_companies_in_sector = sector_count.loc[current_company_sector]        
                #print "number of companies in the sector = %s" % num_comp_in_sector
                
                #calculate the amount to invest in this company                
                value_to_buy = value_per_sector/num_companies_in_sector
                #print "Buying and/or rebalancing %s at value = %s" % (stock, value_to_buy)
                order_target_value(stock,value_to_buy)            
                        
In [29]:
"""
    This cell gets the historical pricing data for all the SIDs in my universe. 
    Then kicks off my algo using that data. 
"""
# I set the start and end date I want my algo to run for
start_algo = '2002-01-01'
end_algo = '2014-12-30'

# I make a series out of just the SIDs. 
SIDs = CEOs.SID

# Then call get_pricing on the series of SIDs and store the results in a new dataframe called prices. 
data = get_pricing(
    SIDs,     
    start_date= start_algo, 
    end_date= end_algo,
    fields ='close_price',
    handle_missing='ignore'
)

#: Here I'm defining the algo that I have above so I can run with a new graphing method
my_algo = TradingAlgorithm(
    initialize=initialize, 
    handle_data=handle_data
)

# Run algorithms
returns = my_algo.run(data)
[2015-05-30 19:00] INFO: Performance: Simulated 3272 trading days out of 3272.
[2015-05-30 19:00] INFO: Performance: first open: 2002-01-02 14:31:00+00:00
[2015-05-30 19:00] INFO: Performance: last close: 2014-12-30 21:00:00+00:00
In [32]:
#: Create a figure to plot on the same graph
fig = pyplot.figure()
ax1 = fig.add_subplot(211)

#: Plot the graph
benchmark_bt.risk.benchmark_period_return.plot(label="SPY")
my_algo.perf_tracker.cumulative_risk_metrics.algorithm_cumulative_returns.plot(label="Fortune 1000 Women-Led Companies")
ax1.set_ylabel('% Cumulative Return', fontsize=20)
ax1.set_title("Cumulative Return", fontsize=20)
ax1.legend(loc='best')
fig.tight_layout()
pyplot.show()
In [33]:
bench_tot_return = benchmark_bt.risk.benchmark_period_return.iloc[-1]
algo_tot_return = my_algo.perf_tracker.cumulative_risk_metrics.algorithm_cumulative_returns.iloc[-1]

bench_pct_ret = bench_tot_return * 100
algo_pct_ret = algo_tot_return * 100
bench_algo_diff = (algo_tot_return - bench_tot_return) * 100

print "Algo Percent Returns %s" % algo_pct_ret
print "Benchmark Percent Returns %s" % bench_pct_ret
print "Difference %s" % bench_algo_diff
Algo Percent Returns 275.7693224
Benchmark Percent Returns 122.305952214
Difference 153.463370186

From this, we can see that the heavy investment in the consumer sector in the first version of the algo does impact performance. But again, it isn't the sole reason for the performance.

WIL and PAX

There are at least two existing funds with a gender focus and I've been told there are as many as 17 gender focused investment products.

The Pax Global Women’s Leadership Index (PXWIX) is the first broad-market index of the highest-rated companies in the world in advancing women’s leadership

The Women In Leadership index (WIL) tracks a weighted index of 85 U.S.-based companies that are listed on the NYSE or NASDAQ, have market capitalizations of at least $250 million, and have a woman CEO or a board of directors that’s at least 25% female.

Here is a look at them, plotted against the SPY.

I don't have any explaination for why their performance is so different from my algo. The PAX fund is international, so it certainly isn't apples to apples. The WIL is using some different criteria as well.

In [55]:
funds = local_csv("Womens_Funds.csv", date_column='Date')
funds = funds.sort_index(ascending=True)
funds['SPY'] = get_pricing('SPY', start_date='2002-01-02', end_date='2015-02-19', fields='close_price')

def cum_returns(df):
        return (1 + df).cumprod() - 1
        
cum_returns(funds.pct_change()).plot()
Out[55]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f0cf6cb4f50>

Next Steps

I think one of the most interesting question I have been asked about this strategy is if it is actually an investment in women CEOs, or CEO turnover.

From here, I plan to do an event study looking at chances in CEOs and how that might work as an investment strategy.