Notebook

TradeableUS Methodology

by Gil Wassermann

The goal of this project is to create a universe of the most tradeable securities with a view to optimizing pipeline performance and reducing noisy data casued by untradeable assets. If a robust, tradeable universe can be established, users will be able to create better, more reliable algorithms.

A first pass of this process is completed in a series of steps:

  • Amalgamate existing research on universe filtration into a single zipline filter and apply to Pipeline output. (Tradeability Filter)
  • Clean any sector bias (Sector Filter)
  • Remove stocks in a robust manner until the desired number of securities in the universe have been reached

After this initial universe is created, securities are only removed if they fail to meet the tradeability filter. If a stock is removed, it is proposed to be replaced by the most liquid stock that passes the tradeability filter that is not in the universe. After a stock is proposed in this manner, it is checked to see that it does not surpass the sector exposure limit. If not, the stock is added to the universe; if so, the next most liquid stock is proposed.

The create_tradeable method allows you to customize both the number of desired securities in the universe as well as the the sector exposure threshold. The former allows you to create a Tradeable500US, Tradeable1500US etc. while the latter allows you to set a target percentage to limit the influence of particular industry groups in the alpha generation process. Included in this notebook are some graphics to observe sector exposures.

The filters used are:

  • Has high volume traded
  • Is primary share
  • Has substantial market cap (>$300m)
  • Not a depositary receipt
  • Is common stock
  • Is not traded over the counter
  • Not just issued (non-IPO)
  • Not a limited partnership (two filters here)
  • Is financially viable (positive sum of last four quarter's earnings)
  • Is liquid (also guard against recent IPOs)

To remain sector neutral, we create a filter that only allows us to retrieve the maximum number of equities per sector (given by the sector_exposure_limit) and then we take the tradeable_count most liquid assets in the past month from this list.

More information about the filter process can be found here:

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
from datetime import timedelta, date
from quantopian.research import run_pipeline
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume, CustomFactor, Latest
from quantopian.pipeline.filters.morningstar import IsPrimaryShare
from quantopian.pipeline.data import morningstar as mstar
from quantopian.pipeline.classifiers.morningstar import Sector


# Constants that need to be global
COMMON_STOCK= 'ST00000001'

SECTOR_NAMES = {
 101: 'Basic Materials',
 102: 'Consumer Cyclical',
 103: 'Financial Services',
 104: 'Real Estate',
 205: 'Consumer Defensive',
 206: 'Healthcare',
 207: 'Utilities',
 308: 'Communication Services',
 309: 'Energy',
 310: 'Industrials',
 311: 'Technology' ,
}
        
# Average Dollar Volume without nanmean, so that recent IPOs are truly removed
class ADV_adj(CustomFactor):
    inputs = [USEquityPricing.close, USEquityPricing.volume]
    window_length = 252
    
    def compute(self, today, assets, out, close, volume):
        close[np.isnan(close)] = 0
        out[:] = np.mean(close * volume, 0)
        
        
def universe_filters():
    """
    Create a Pipeline producing Filters implementing common acceptance criteria.
    
    Returns
    -------
    zipline.Filter
        Filter to control tradeablility
    """

    # Equities with an average daily volume greater than 750000.
    high_volume = (AverageDollarVolume(window_length=252) > 750000)
    
    # Not Misc. sector:
    sector_check = Sector() != -1.
    
    # Equities that morningstar lists as primary shares.
    # NOTE: This will return False for stocks not in the morningstar database.
    primary_share = IsPrimaryShare()
    
    # Equities for which morningstar's most recent Market Cap value is above $300m.
    have_market_cap = mstar.valuation.market_cap.latest > 300000000
    
    # Equities not listed as depositary receipts by morningstar.
    # Note the inversion operator, `~`, at the start of the expression.
    not_depositary = ~mstar.share_class_reference.is_depositary_receipt.latest
    
    # Equities that listed as common stock (as opposed to, say, preferred stock).
    # This is our first string column. The .eq method used here produces a Filter returning
    # True for all asset/date pairs where security_type produced a value of 'ST00000001'.
    common_stock = mstar.share_class_reference.security_type.latest.eq(COMMON_STOCK)
    
    # Equities whose exchange id does not start with OTC (Over The Counter).
    # startswith() is a new method available only on string-dtype Classifiers.
    # It returns a Filter.
    not_otc = ~mstar.share_class_reference.exchange_id.latest.startswith('OTC')
    
    # Equities whose symbol (according to morningstar) ends with .WI
    # This generally indicates a "When Issued" offering.
    # endswith() works similarly to startswith().
    not_wi = ~mstar.share_class_reference.symbol.latest.endswith('.WI')
    
    # Equities whose company name ends with 'LP' or a similar string.
    # The .matches() method uses the standard library `re` module to match
    # against a regular expression.
    not_lp_name = ~mstar.company_reference.standard_name.latest.matches('.* L[\\. ]?P\.?$')
    
    # Equities with a null entry for the balance_sheet.limited_partnership field.
    # This is an alternative way of checking for LPs.
    not_lp_balance_sheet = mstar.balance_sheet.limited_partnership.latest.isnull()
    
    # Highly liquid assets only. Also eliminates IPOs in the past 12 months
    # Use new average dollar volume so that unrecorded days are given value 0
    # and not skipped over
    # S&P Criterion
    liquid = ADV_adj() > 250000
    
    # Add logic when global markets supported
    # S&P Criterion
    domicile = True
    
    universe_filter = (high_volume & primary_share & have_market_cap & not_depositary &
                      common_stock & not_otc & not_wi & not_lp_name & not_lp_balance_sheet &
                    liquid & domicile)
    
    return universe_filter


def sector_filters(tradeable_count, sector_exposure_limit):
    """
    Mask for Pipeline in create_tradeable. Limits each sector so as not to be over-exposed

    Parameters
    ----------
    tradeable_count : int
        Target number of constituent securities in universe
    sector_exposure_limit: float
        Target threshold for any particular sector
    Returns
    -------
    zipline.Filter
        Filter to control sector exposure
    """
    
    # set thresholds
    if sector_exposure_limit < ((1. / len(SECTOR_NAMES))):
        threshold = int(math.ceil((1. / len(SECTOR_NAMES)) * tradeable_count))
    elif sector_exposure_limit > 1.:
        threshold = tradeable_count
    else:
        threshold = int(math.ceil(sector_exposure_limit * tradeable_count))
     
    # retrieve sector codes
    sector = Sector()
    
    # for each sector create a filter of upper possible threshold 
    basic_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(101))
    consumer_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(102)) 
    financial_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(103)) 
    re_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(104)) 
    cd_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(205)) 
    healthcare_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(206)) 
    utilities_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(207)) 
    comms_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(308))
    energy_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(309))
    industrials_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(310))
    tech_trim = AverageDollarVolume(window_length=21).top(threshold, mask=sector.eq(311))
    
    return basic_trim | consumer_trim | financial_trim | re_trim | cd_trim | healthcare_trim | \
        utilities_trim | comms_trim | energy_trim | industrials_trim | tech_trim
    


# Method to create a tradeable universe of a certain size on a certain date
def create_tradeable(tradeable_count=500, sector_exposure_limit=0.15, date='2015-01-01'):
    """
    Computes a given number of the most tradeable stocks and presents them as a tradeable universe.

    Parameters
    ----------
    tradeable_count : int
        Target number of constituent securities in universe
    sector_exposure_limit: float
        Target threshold for any particular sector
    date: string
        YYYY-MM-DD for date on which to run the universe

    Returns
    -------
    tradeable_secs : pd.Series
        Equity objects of securities to be included in the TradeableUS universe.
    """
    
    # create Pipeline
    tradeable_pipe = Pipeline()    
    sector = Sector()
    
    # add the monthly average dollar volume traded zscored between industry to maintain sector neutrality
    tradeable_pipe.add(AverageDollarVolume(window_length=21), 'Liquidity')
    
    # add filters to the pipe to weed out untradeable stocks
    tradeable_filter = universe_filters()
    sector_filter = sector_filters(tradeable_count, sector_exposure_limit)
    tradeable_pipe.set_screen(tradeable_filter & sector_filter)
    tradeable_pipe_results = run_pipeline(tradeable_pipe, date, date)

    # if the desired number of securities is larger than the number of filtered securities, then just return
    # filtered securities as this is the maximum number of tradeable equities in the entire stock universe
    if len(tradeable_pipe_results.index.levels[1]) < tradeable_count:
        return tradeable_pipe_results.index
    else:
        tradeable_pipe_results.sort('Liquidity', ascending=False)
        tradeable_secs = pd.Series(tradeable_pipe_results.index.levels[1].get_values())
        return tradeable_secs.head(tradeable_count)

def tradeable_sector_analysis(t_set, date):
    """
    Quick visualization of sector exposures in the universe

    Parameters
    ----------
    t_set : pd.Series
        Index of every constituent of universe
    date: string
        YYYY-MM-DD for date on which to run the analysis of the universe        

    """
    
    # run pipeline with sector and close price
    pipe = Pipeline()
    pipe.add(Latest(inputs=[USEquityPricing.close]), 'Close')
    pipe.add(Sector(), 'Sector')
    results = run_pipeline(pipe, date, date)
    
    # get the results only for those in the tradeable universe
    results.index = results.index.levels[1]
    results = results.loc[t_set.as_matrix(),:]
    
    # group data
    sector_groups = results.groupby(by='Sector')
    sector_counts = sector_groups.count()
    xticks = [SECTOR_NAMES.get(i) for i in sector_counts.index]
    
    # create bar chart of number of companies in each sector
    ax_freq = sector_counts.plot(kind='bar', color='c')
    ax_freq.set_xticklabels(xticks, rotation=45)
    ax_freq.set_ylabel('Frequency')
    ax_freq.set_title('Sector Frequencies')
    ax_freq.legend().set_visible(False)
    ax_prop = sector_counts.plot(kind='pie', subplots=True, labels=xticks, colormap='Blues')
    ax_prop[0].set_ylabel('');
    

def update_universe(tradeable_0, tradeable_count, sector_exposure_limit, date, timedelta_days):
    """
    Takes in one universe and returns another timedelta_days later
    
    Parameters
    ----------
    tradeable_0 : pd.Series
        Equity objects of securities to be included in the TradeableUS universe       
    tradeable_count : int
        Desired number of securities in universe       
    sector_exposure_limit : float
        Target threshold for any particular sector
    date : datetime
        datetime object of date that tradeable_0 was run
    timedelta_days :
        interval until next update of universe

    Returns
    -------
    turnover : float
        For analysis purposes. Calculates what fraction of the universe
        has changed between time periods
    tradeable_1_index : pd.Series
        Index of securities to be included in the TradeableUS universe got next time period    
    """
    
    # Run pipeline for next month
    full_pipe = Pipeline()
    full_pipe.add(AverageDollarVolume(window_length=21), 'Liquidity')
    full_pipe.add(Sector(), 'Sector')
    tradeable_filters = universe_filters()
    full_pipe.set_screen(tradeable_filters)
    full_results = run_pipeline(full_pipe, date +
                                timedelta(days=timedelta_days) , date + timedelta(days=timedelta_days))
    
    
    # remove time component of multiindex
    full_results.index = full_results.index.levels[1]
    
    # get results in tradeable_0 in the next period
    tradeable_0_results = full_results.loc[tradeable_0.tolist(),:]
    
    # remove nan values, show up if tradeable_0 securities have fallen out of index
    tradeable_0_results = tradeable_0_results.dropna()
    
    # group by sector for sector neutrality threshold
    tradeable_0_sector_counts = tradeable_0_results.groupby('Sector').count()
    
    # get threshold
    threshold = int(math.ceil(tradeable_count * sector_exposure_limit))
    
    # list of securities to add ranked by liquidity
    add_list = full_results.drop(tradeable_0_results.index.get_values().tolist())
    add_list = add_list.sort('Liquidity', ascending=False)
    
    # number of securities to add
    to_add = tradeable_count - len(tradeable_0_results.index)
    turnover = float(to_add) / float(tradeable_count)
    
    # create variable for index values as list
    tradeable_1_index = tradeable_0_results.index.get_values().tolist()
    
    # loop through proposed index
    for i in range(len(add_list.index)):
        
        # if no more securities to add
        if to_add == 0:
            return turnover, pd.Series(tradeable_1_index)
        
        # if addition would not break sector exposure limit
        if (tradeable_0_sector_counts.loc[add_list.iloc[i]['Sector']]['Liquidity'] + 1)  < threshold:
            tradeable_1_index.append(add_list.iloc[i].name)
            tradeable_0_sector_counts.loc[add_list.iloc[i]['Sector']]['Liquidity'] += 1 
            to_add -= 1
        
        # if addition woulf break sector exposure limit
        else:
            continue

Tradeable500US

Let us look at the Tradeable500US and get a quick overview of its constituents. Then we will have a look at its turnover (the number of new equities in an update over the total number of equities in the universe).

In [2]:
tradeable_0 = create_tradeable(500, 0.2, '2015-01-01')
tradeable_sector_analysis(tradeable_0, '2015-01-01')
In [3]:
# create tradeable universe
tradeable500US = create_tradeable(500, 0.2, date(2003,1,1))
turnovers500US = []

# iterate over months
for month in (date(2003, 1, 1) + timedelta(days=30*n) for n in range(155)):
    turnover, tradeable_next = update_universe(tradeable500US, 500, 0.2, month, 30)
    tradeable500US = tradeable_next
    turnovers500US.append(turnover)

# plot results
months = range(len(turnovers500US))
plt.plot(months, turnovers500US)

plt.axhline(np.mean(turnovers500US), color='r')
plt.title('Monthly Turnover Tradeable500US')
plt.xlabel('Months Elapsed')
plt.ylabel('Turnover');

As we can see above, our universe is not overweight any particular sector and the average turnover is less than 0.3%, which corresponds to one and a bit securities per month. Also. It should be noted that this spike occurs around 70 months after Jan 2003, which corresponds to late 2008 (the collapse of Lehman Brothers). Even in this unstable macroeconomic state, the universe only sees 1.4% turnover (7 securities).

Tradeable1500US

In [4]:
tradeable_0 = create_tradeable(1500, 0.2, '2015-01-01')
tradeable_sector_analysis(tradeable_0, '2015-01-01')
In [5]:
# create tradeable universe
tradeable1500US = create_tradeable(1500, 0.2, date(2003,1,1))
turnovers1500US = []

# iterate over months
for month in (date(2003, 1, 1) + timedelta(days=30*n) for n in range(155)):
    turnover, tradeable_next = update_universe(tradeable1500US, 1500, 0.2, month, 30)
    tradeable1500US = tradeable_next
    turnovers1500US.append(turnover)

# plot results
months = range(len(turnovers1500US))
plt.plot(months, turnovers1500US)

plt.axhline(np.mean(turnovers1500US), color='r')
plt.title('Monthly Turnover Tradeable1500US')
plt.xlabel('Months Elapsed')
plt.ylabel('Turnover');

Once again we see a spike in turnover during the financial crisis and an average turnover of well below 1%.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory or other services by Quantopian.

In addition, the content of the website neither constitutes investment advice nor offers any opinion with respect to the suitability of any security or any specific investment. Quantopian makes no guarantees as to accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.