Normalization and Classifiers¶

Earlier this week we shipped several new Pipeline API features that combine to make it significantly easier to perform grouped operations on Factors. The most common use for the new functionality will likely be construction of sector-normalized Factors, but the new additions to the Pipeline toolbox have far broader application.

There are two important new additions to the Pipeline API: Factor normalization methods, and a Classifier expression type. It's easiest to understand the value of Classifiers after seeing a simple normalization example.

Normalization Methods: `demean` and `zscore`¶

Many Factors produce results that are not directly comparable with the results of other Factors. A technical indicator like RSI might produce an output bounded between 1 and 100, whereas a fundamental ratio might produce a values of any real number. When we want to incorporate multiple incommensurable factors into a single model, it's often helpful to apply a normalization step on the Factor outputs to make direct comparisons more meaningful.

The first major feature in this release is the addition of two new methods on the Factor base class: demean and zscore. These methods are designed to facilitate make it easier to normalize the results of Factor computations.

Example: `demean`¶

demean() is the simpler of the two new normalization methods. The result of calling demean() on a factor is a new factor that works by first computing the original factor, and then subtracting the mean over all assets from each day's output.

In this example, we compare the results of a 30-Day returns factor, before and after de-meaning.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import Returns
from quantopian.research import run_pipeline

def demean_example():
    returns = Returns(window_length=30)
    demeaned_returns = returns.demean()
    # Our pipeline includes the original returns factor, the demeaned version, 
    # and the difference between them.
    return Pipeline(
        columns={
            'vanilla': returns, 
            'demeaned': demeaned_returns,
            'diff': returns - demeaned_returns,
        }
    )

# Since demean() just subtracts the daily mean from 'vanilla' column, we should expect that
# the difference between 'vanilla' and 'demeaned' should be constant within each day.
results0 = run_pipeline(demean_example(), '2014', '2014-03-01')
results0.head()

# The value of that constant difference should be the daily mean.
# Note that 0.36717 on day one here matches up with the difference in the frame above.
results0['vanilla'].groupby(level=0).mean().head()

2014-01-02 00:00:00+00:00    0.036717
2014-01-03 00:00:00+00:00    0.038169
2014-01-06 00:00:00+00:00    0.046453
2014-01-07 00:00:00+00:00    0.036338
2014-01-08 00:00:00+00:00    0.039288
Name: vanilla, dtype: float64

# A nice property of a de-meaned Factor is that it is centered about 0.
results0['demeaned'].groupby(level=0).mean().head()

2014-01-02 00:00:00+00:00    1.920103e-18
2014-01-03 00:00:00+00:00   -3.160884e-19
2014-01-06 00:00:00+00:00    1.009937e-16
2014-01-07 00:00:00+00:00   -8.745146e-17
2014-01-08 00:00:00+00:00    1.001762e-17
Name: demeaned, dtype: float64

Example : `zscore`¶

zscore() is only slightly more complex than demean(). In addition to subtracting the daily mean from each output, it also divides by the daily standard deviation. If an asset has a Z-Score of 1 on a given day for some factor, then the value of that factor was one standard deviation above the daily mean. Similarly, an asset with a Z-Score of -1 was one standard deviation below the daily mean.

The API for zscore is identical to that od demean().

def zscore_example():
    returns = Returns(window_length=30)
    zscored_returns = returns.zscore()
    return Pipeline(
        columns={
            'vanilla': returns,
            'zscored': zscored_returns,
        },
        screen=returns.notnull(),
    )

results1 = run_pipeline(zscore_example(), '2014', '2014-03-01')
results1.head()

results1.describe()

Masked Normalization¶

Often we only want to consider some portion of the full tradeable universe when computing a normalization. It can be useful, for example, to exclude extreme outliers from mean and standard devation computations. We often also want to ignore assets that are hard to trade, either because they're illiquid or they're nonstandard share classes.

Both demean and zscore accept an optional mask argument, which can be passed a Filter. When a Filter is supplied as a mask, we treat all locations where the Filter produced False as though those locations had NaN values in the data being normalized. Since demean and zscore already know how to ignore NaNs, providing a mask has the effect of removing the masked values from our normalization calculation.

Example: Z-Score Returns After Removing Illiquid Assets, Outliers, and Non-Primary Shares¶

In this example, we compute the same 30-day returns factor. We then Z-Score those returns, ignoring non-primary share classes, stocks with low dollar volume, and stocks with very high or low returns.

from quantopian.pipeline.factors import AverageDollarVolume
from quantopian.pipeline.filters.morningstar import IsPrimaryShare

def masked_zscore_returns_example():
    returns = Returns(window_length=30)
    
    is_liquid = AverageDollarVolume(window_length=30).percentile_between(25, 100)
    is_primary = IsPrimaryShare()
    no_returns_outliers = returns.percentile_between(2, 98)
    base_universe = is_liquid & no_returns_outliers & is_primary
    
    masked_zscored = returns.zscore(mask=base_universe)
    
    return Pipeline(
        columns={'masked_zscored': masked_zscored, 'returns': returns}, 
        screen=masked_zscored.notnull()
    )

results2 = run_pipeline(masked_zscore_returns_example(), '2014', '2014-03-01')
results2.head()

results2.describe()

import seaborn as sns
import matplotlib.pyplot as plt

def zscore_histogram(axis, series, ylabel=None):
    plot = sns.distplot(series, ax=axis, kde=False)
    plot.set_yscale('log')
    plot.grid(False)
    if ylabel:
        plot.set_ylabel(ylabel)
    return plot

fig, plots = plt.subplots(ncols=2, sharey=True)

zscore_histogram(plots[0], results1.zscored, ylabel="# of Assets in Z-Score Range")
zscore_histogram(plots[1], results2.masked_zscored)

sns.despine(fig=fig, top=True, right=True)

When Z-Scoring without masking, the vast majority (note that the above plots are log-scale) of assets have Z-Scores near 0, because a small number of outliers distort the distribution significantly.

After masking, our Z-Scores are much more uniformly distributed.

Classifiers¶

Another common scenario encountered when working with financial data is the need to transform Factor results based on some method of labelling assets. For example, when comparing assets based on some fundamental ratio, it might make more sense to compare each asset to other assets in the same industry instead of comparing against the full universe of assets. Or we might want to compare across companies of approximately the same size.

The second major feature that's been added in this release is a new core expression type: Classifier. Whereas Factors are expressions producing numerical values, and Filters are expressions producing boolean (True/False) values, Classifiers are expressions that produce labels, which can then be used as grouping keys for another expression.

Both demean() and zscore() accept an optional groupby parameter, which can be passed a Classifier. Providing a groupby causes row-normalizations to be applied on groups of assets that all received the same label from the grouping classifier.

There are currently three ways to construct a Classifier:

The .latest attribute of any morningstar column of dtype int64 produces a Classifier. There are currently nine such columns:
- morningstar.asset_classification.cannaics
- morningstar.asset_classification.morningstar_economy_sphere_code
- morningstar.asset_classification.morningstar_industry_code
- morningstar.asset_classification.morningstar_industry_group_code
- morningstar.asset_classification.morningstar_sector_code
- morningstar.asset_classification.naics
- morningstar.asset_classification.sic
- morningstar.asset_classification.stock_type
- morningstar.asset_classification.style_box
More information on each of these columns can be found in the Fundamentals API Reference.

There are two new directly-importable Classifier subclasses:
- quantopian.pipeline.classifiers.morningstar.Sector
- quantopian.pipeline.classifiers.morningstar.SuperSector
These built-in classifiers produce the same output as morningstar_sector_code.latest and morningstar_economy_sphere_code.latest, respectively. However, because we expect these to be the most commonly-used classifier columns, Sector and SuperSector provide a few special facilities beyond what's available from the generic .latest classifiers:
- Sector and SuperSector have hand-written docstrings that are accessible via __doc__ or the ? magic.
- Sector and SuperSector provide symbolic names for their labels as class-level attributes. For example, Sector.BASIC_MATERIALS is set to 101, the sector code used by morningstar for companies in the materials space. SuperSector provides similar symbolic names. For example. SuperSector.CYCLICAL is set to 1.

There are several new Factor methods that produce classifiers by ranking and bucketing stocks based on quantiles of a Factor. The most general of these methods is Factor.quantiles(), which takes an integer indicating how many buckets to use. For example, if we wanted to group securities into small, medium, and large cap buckets, we could do MarketCap().quantiles(3).

Factor.quartiles(), Factor.quintiles(), and Factor.deciles() have also been added. These are simple convenience aliases for quantiles(4), quantiles(5), and quantiles(10), respectively.

Example: Built-In vs. Generic Classifiers¶

from quantopian.pipeline.data.morningstar import asset_classification, valuation
from quantopian.pipeline.classifiers.morningstar import Sector

# These produce the same data, but Sector has symbolic constants and hand-written docs:
sector_generic = asset_classification.morningstar_sector_code
sector_builtin = Sector()

print (
    "Docs for built-in Sector class:\n" + sector_builtin.__doc__
)
print "Symbolic Constants:" 
dir(sector_builtin)[1:12]

Docs for built-in Sector class:

        Classifier that groups assets by Morningstar Sector Code.

        There are 11 possible classifications:

        * 101 - Basic Materials
        * 102 - Consumer Cyclical
        * 103 - Financial Services
        * 104 - Real Estate
        * 205 - Consumer Defensive
        * 206 - Healthcare
        * 207 - Utilities
        * 308 - Communication Services
        * 309 - Energy
        * 310 - Industrials
        * 311 - Technology

        These values are provided as integer constants on the class.

        For more information on morningstar classification codes, see:
        https://www.quantopian.com/help/fundamentals#industry-sector.
        
Symbolic Constants:

['BASIC_MATERIALS',
 'COMMUNICATION_SERVICES',
 'CONSUMER_CYCLICAL',
 'CONSUMER_DEFENSIVE',
 'ENERGY',
 'FINANCIAL_SERVICES',
 'HEALTHCARE',
 'INDUSTRIALS',
 'REAL_ESTATE',
 'TECHNOLOGY',
 'UTILITIES']

print "The Basic Materials sector code is %d." % Sector.BASIC_MATERIALS

The Basic Materials sector code is 101.

print (
    "Docs for generic sector (this is the docstring that's generated "
    "for all columns):\n" + sector_generic.__doc__
)
sector_generic

Docs for generic sector (this is the docstring that's generated for all columns):

    A column of data that's been concretely bound to a particular dataset.

    Instances of this class are dynamically created upon access to attributes
    of DataSets (for example, USEquityPricing.close is an instance of this
    class).

    Attributes
    ----------
    dtype : numpy.dtype
        The dtype of data produced when this column is loaded.
    latest : zipline.pipeline.data.Factor or zipline.pipeline.data.Filter
        A Filter, Factor, or Classifier computing the most recently known value
        of this column on each date.

        Produces a Filter if self.dtype == ``np.bool_``.
        Produces a Classifier if self.dtype == ``np.int64``
        Otherwise produces a Factor.
    dataset : zipline.pipeline.data.DataSet
        The dataset to which this column is bound.
    name : str
        The name of this column.

asset_classification.morningstar_sector_code::int64

Example: Compute Earnings Yield, Z-Scored by Sector¶

In this example, we load the most recently-known earnings yield for each asset, and compare the effect of Z-Scoring over the whole universe vs Z-Scoring by sector group.

from quantopian.pipeline.data.morningstar import valuation_ratios

def grouped_earnings_yield_example():
    sector = Sector()
    earning_yield = valuation_ratios.earning_yield.latest
    
    zscored_naive = earning_yield.zscore()
    zscored_grouped = earning_yield.zscore(groupby=sector)
    return Pipeline(
        columns={
            'sector': sector,
            'yield': earning_yield,
            'yield_zscored': zscored_naive,
            'yield_zscored_grouped': zscored_grouped,
        },
        screen=zscored_grouped.notnull(),
    )
    
yields = run_pipeline(grouped_earnings_yield_example(), '2014', '2014-03')
yields.head()

yields.describe()

One thing that should look immediately suspicious about the output above is that our non-grouped Z-Scores are all very close to each other. This is usually an indication that large outliers are compressing our results by inflating the standard deviation of the data.

Plotting the magnitude of the min and max by sector quickly confirms that we do indeed have some large outliers:

# Slice off the first few dates for visualization
yields_initial = (
    yields['2014-01-02':'2014-01-06']
    .reset_index()
    .rename(columns=dict(level_0='date', level_1='asset'))
)

fig, (max_plot, min_plot) = plt.subplots(2, 1)

# Draw the maximum yield by sector on each date.
sns.barplot(
    x='date', 
    y='yield', 
    hue='sector', 
    data=yields_initial,
    ci=None,
    estimator=np.max,
    log=True,
    ax=max_plot,
    palette="Set3",
)
max_plot.set_ylabel('Maximum Yield by Sector')
max_plot.set_xlabel('Date')
max_plot.set_ylim(1e-1, 1e2);
max_plot.legend(ncol=3, title='Sector Code')

# Draw the minimum yield by sector on each date.
plot = sns.barplot(
    x='date', 
    y='yield', 
    hue='sector', 
    data=yields_initial,
    ci=None,
    estimator=lambda arr: abs(np.min(arr)),
    log=True,
    palette="Set3",
)
min_plot.set_ylabel('Minimum Yield (Magnitude) by Sector')
min_plot.set_xlabel('Date')
min_plot.set_ylim(1e-1, 1e7);
min_plot.legend(ncol=3, title='Sector Code');

Note that the high bars in the above are actually much more extreme than they appear, since they're plotted on a log scale!

We can better see the flattening effect of Z-Scoring with outliers, as well as the effect of normalizing by sector, by plotting the difference between the max and min values for each sector:

def plot_range_by_sector(data, ycol, axis, ylimits, log_scale):
    """
    Generate a bar-chart of data[ycol], with a bar for each unique value in data['sector'].
    
    The height of each bar is the difference of the sector max minus the sector min each day.
    """
    plot = sns.barplot(
        x='date', 
        y=ycol, 
        hue='sector', 
        data=data, 
        ax=axis,
        ci=None,
        estimator=lambda row: abs(np.max(row) - np.min(row)),
        palette="Set3",
    )
    plot.set_ylim(*ylimits)
    if log_scale:
        plot.set_yscale('log')
    plot.set_ylabel("(Max - Min): " + ycol)
    plot.legend(ncol=3, title='Sector Code')
    return plot

fig, plots = plt.subplots(3, 1, figsize=(14, 20))

plot_range_by_sector(
    yields_initial, 
    ycol='yield', 
    axis=plots[0],
    ylimits=(1e-2, 1e6),
    log_scale=True
)
plot_range_by_sector(
    yields_initial, 
    ycol='yield_zscored',
    ylimits=(0, 80),
    axis=plots[1],
    log_scale=False,
)
plot_range_by_sector(
    yields_initial, 
    ycol='yield_zscored_grouped',
    ylimits=(0, 80),
    axis=plots[2],
    log_scale=False,
    # See note above on why there's a ; here.
);

In the raw yields, we can see that there's an enormous outlier (note that the raw yield chart is log-scale) on day 1 in sector 308 (Communication Services). This outlier single-handedly inflates the standard deviation over the whole distribution enough to compress the min/max values in the ungrouped Z-Scores (the center plot) to almost zero.

In the sector-grouped Z-Scores (the bottom plot), the effects of the large outlier are contained within the sector, which allows the values in other sectors to better reflect the diversity of observed yields.

What the above plot doesn't show, however, is that the distribution of values within sectors with outliers is still compressed considerably. What we'd really like to do is apply our grouped normalization, while also removing extreme values from the distribution.

A crude filter picks out the worst offenders easily:

# Note in particular the (likely erroneous) value of -400797 on day 1 for HMTV.  This is the big
# day 1 outlier in the charts above.
yields[(yields['yield'] < -10) | (yields['yield'] > 10)]

Example: Compute Earnings Yield, Z-Scored by Sector, Excluding Outliers¶

The mask and groupby parameters can be used together to perform grouped transformations while ignoring undesired values.

def masked_grouped_earnings_yield_example():
    sector = Sector()
    earning_yield = valuation_ratios.earning_yield.latest
    # We could also have done something like earning_yield.percentile_between(1, 99) here.
    without_outliers = (-10 < earning_yield) & (earning_yield < 10)
    
    zscored_masked = earning_yield.zscore(mask=without_outliers)
    zscored_masked_grouped = earning_yield.zscore(mask=without_outliers, groupby=sector)

    return Pipeline(columns={'sector': sector,
                             'yield': earning_yield,
                             'yield_zscored_masked': zscored_masked,
                             'yield_zscored_masked_grouped': zscored_masked_grouped},
                    screen=zscored_masked_grouped.notnull())

masked_yields = run_pipeline(masked_grouped_earnings_yield_example(), '2014', '2014-03')
masked_yields.head()

# Note that the min/max on `yield` are now between -10 and 10.
masked_yields.describe()

Example: Quantile-based Classifiers¶

The above examples all use built-in classifiers. It's also possible to create a Classifier from any existing Factor via the quantiles method:

In this example, we show how to create a classi

from quantopian.pipeline.data.morningstar import valuation

def quantiles_example():
    market_cap = valuation.market_cap.latest
    market_cap_decile = market_cap.deciles()  # Equivalent to market_cap.quantiles(10)
    
    returns = Returns(window_length=30)
    
    excess_returns = returns.demean(
        mask=returns.percentile_between(1, 99),
        groupby=market_cap_decile,  # Grouping by a computed classifier works as expected.
    )
    
    return Pipeline(
        columns={
            'market_cap': market_cap,
            'market_cap_decile': market_cap_decile,  # Classifiers can be set as output columns.
            'returns': returns,
            'excess_returns': excess_returns,
        },
        screen=excess_returns.notnull(),
    )

quantiles_results = run_pipeline(quantiles_example(), '2014', '2014-03')
quantiles_results.head()

With our quantiles classifier, we can confirm the well-known fact that small cap stocks tend to outperform large cap stocks:

returns_by_decile = quantiles_results.groupby(['market_cap_decile'])['returns'].mean()

plot = sns.barplot(
    x=returns_by_decile.index,
    y=returns_by_decile.values,
    color='red',
)
plot.set_title("30-Day Returns by Market Cap Decile", fontsize=16)
sns.despine(plot.figure)

plot.grid(False)
plot.set_xlabel("Market Cap Decile", fontsize=14);

fig, plot = plt.subplots(1, 1, figsize=(14, 8))

colors = sns.color_palette('RdBu_r', 10)
for decile, color in enumerate(colors):
    sns.distplot(
        quantiles_results['excess_returns'][quantiles_results['market_cap_decile'] == decile],
        hist=False,
        color=color,
        label=decile,
        ax=plot,
    )
    
plot.legend(title='Decile', fontsize=14)
plot.set_xlabel('Excess Return over Decile Mean', fontsize=14)
sns.despine(fig=fig)
plot.grid(False)
plot.set_yticklabels([]);
plot.set_title('Distribution of Returns by Market Cap Decile', fontsize=16);

Conclusions and Future Work¶

Classifiers have been a planned extension to the Pipeline API since the early days of the design. I think they're one of the last missing pieces to supporting many truly sophisticated quant workflows, so I'm excited to finally be able to see what the community builds with these features.

In the coming weeks, we hope to have a slew of additional small improvements to classifiers, including groupby support for Factor.rank(), and support for non-integer columns (especially strings) from the morningstar database.

For more info on the working with Classifiers and normalizations, there's a new Help Docs Section on Normalization, and new API Reference entries for Classifier, the new builtins, Factor.demean, and Factor.zscore.

		demeaned	diff	vanilla
2014-01-02 00:00:00+00:00	Equity(2 [AA])	0.152321	0.036717	0.189038
Equity(21 [AAME])	-0.016866	0.036717	0.019851
Equity(24 [AAPL])	0.045246	0.036717	0.081963
Equity(25 [AA_PR])	-0.046151	0.036717	-0.009434
Equity(31 [ABAX])	0.092516	0.036717	0.129233

		vanilla	zscored
2014-01-02 00:00:00+00:00	Equity(2 [AA])	0.189038	1.115770
Equity(21 [AAME])	0.019851	-0.123542
Equity(24 [AAPL])	0.081963	0.331433
Equity(25 [AA_PR])	-0.009434	-0.338058
Equity(31 [ABAX])	0.129233	0.677688

	vanilla	zscored
count	304199.000000	3.041990e+05
mean	0.030987	-6.153825e-17
std	0.170554	1.000002e+00
min	-0.950000	-6.813738e+00
25%	-0.026756	-3.453011e-01
50%	0.013514	-1.081289e-01
75%	0.059227	1.713692e-01
max	17.295082	6.567897e+01

		masked_zscored	returns
2014-01-02 00:00:00+00:00	Equity(2 [AA])	1.532109	0.189038
Equity(24 [AAPL])	0.342244	0.081963
Equity(31 [ABAX])	0.867525	0.129233
Equity(52 [ABM])	-0.244158	0.029193
Equity(53 [ABMD])	-0.679725	-0.010004

	masked_zscored	returns
count	1.383310e+05	138331.000000
mean	-9.979936e-18	0.026548
std	1.000004e+00	0.103754
min	-2.615584e+00	-0.235727
25%	-6.367952e-01	-0.038596
50%	-1.381797e-01	0.014829
75%	4.771117e-01	0.076733
max	4.032615e+00	0.499259

		sector	yield	yield_zscored	yield_zscored_grouped
2014-01-02 00:00:00+00:00	Equity(2 [AA])	101	0.0284	0.014357	0.175227
Equity(21 [AAME])	103	0.1111	0.014372	0.202752
Equity(24 [AAPL])	311	0.0715	0.014365	0.444843
Equity(31 [ABAX])	206	0.0233	0.014356	0.398354
Equity(39 [DDC])	101	-0.0037	0.014352	0.158035

	sector	yield	yield_zscored	yield_zscored_grouped
count	200500.000000	200500.000000	200500.000000	2.005000e+05
mean	201.897531	-2.128470	0.000538	-6.471077e-17
std	93.195758	895.125754	0.998611	1.000002e+00
min	101.000000	-400797.330300	-70.011683	-2.372282e+01
25%	103.000000	-0.028600	0.015968	3.518253e-02
50%	206.000000	0.032100	0.023992	1.437575e-01
75%	310.000000	0.058000	0.160452	2.398622e-01
max	311.000000	32.180800	6.544085	1.807121e+01

		sector	yield	yield_zscored	yield_zscored_grouped
2014-01-02 00:00:00+00:00	Equity(15988 [EMIT_F])	104	-12.7151	0.012140	-14.922957
Equity(19177 [LEU])	101	-25.8171	0.009860	-13.666419
Equity(25784 [CPSL])	101	-14.0763	0.011903	-7.378594
Equity(26422 [HXM])	102	-15.8587	0.011593	-19.333553
Equity(38290 [AUMN])	101	-10.6887	0.012492	-5.564353
Equity(39581 [PGRX])	101	-12.1691	0.012235	-6.357186
Equity(41845 [HMTV])	308	-400797.3303	-69.728042	-10.677078
Equity(45252 [WPT])	309	32.1808	0.019952	18.071212
2014-01-03 00:00:00+00:00	Equity(7317 [TBAC])	102	-10.6487	-18.402264	-11.788670
Equity(15988 [EMIT_F])	104	-11.5059	-19.891944	-14.993718
Equity(25784 [CPSL])	101	-14.3532	-24.840109	-11.952047
Equity(26422 [HXM])	102	-16.1277	-27.923914	-17.904300
Equity(38290 [AUMN])	101	-11.8883	-20.556496	-9.867293
2014-01-06 00:00:00+00:00	Equity(7317 [TBAC])	102	-10.6487	-18.400411	-11.788670
Equity(15988 [EMIT_F])	104	-11.5059	-19.889944	-14.993718
Equity(25784 [CPSL])	101	-14.3532	-24.837621	-11.952047
Equity(26422 [HXM])	102	-16.1277	-27.921122	-17.904300
Equity(38290 [AUMN])	101	-11.8883	-20.554430	-9.867293
2014-01-07 00:00:00+00:00	Equity(7317 [TBAC])	102	-10.6487	-18.400411	-11.788670
Equity(15988 [EMIT_F])	104	-11.5059	-19.889944	-14.993718
Equity(25784 [CPSL])	101	-14.3532	-24.837621	-11.952047
Equity(26422 [HXM])	102	-16.1277	-27.921122	-17.904300
Equity(38290 [AUMN])	101	-11.8883	-20.554430	-9.867293
2014-01-08 00:00:00+00:00	Equity(7317 [TBAC])	102	-10.6487	-18.398826	-11.788670
Equity(15988 [EMIT_F])	104	-11.5059	-19.888238	-14.993718
Equity(25784 [CPSL])	101	-14.3532	-24.835514	-11.952047
Equity(26422 [HXM])	102	-16.1277	-27.918765	-17.904300
Equity(38290 [AUMN])	101	-11.8883	-20.552671	-9.867293
2014-01-09 00:00:00+00:00	Equity(7317 [TBAC])	102	-10.6487	-18.396951	-11.778982
Equity(15988 [EMIT_F])	104	-11.5059	-19.886215	-14.993718
...	...	...	...	...	...
2014-02-24 00:00:00+00:00	Equity(15988 [EMIT_F])	104	-15.2362	-1.312981	-0.232225
Equity(25784 [CPSL])	101	-12.3763	-1.063073	-12.376852
Equity(26422 [HXM])	102	-11.9884	-1.029177	-17.315477
Equity(46205 [SFR])	104	-800.8205	-69.960109	-15.552924
Equity(46271 [DRNA])	206	-17.2560	-1.489479	-21.570494
2014-02-25 00:00:00+00:00	Equity(15988 [EMIT_F])	104	-15.2362	-1.312852	-0.232225
Equity(25784 [CPSL])	101	-12.3763	-1.062969	-12.376852
Equity(26422 [HXM])	102	-11.9884	-1.029077	-17.315474
Equity(46205 [SFR])	104	-800.8205	-69.952988	-15.552924
Equity(46271 [DRNA])	206	-17.2560	-1.489331	-21.555570
2014-02-26 00:00:00+00:00	Equity(15988 [EMIT_F])	104	-15.2362	-1.312992	-0.232225
Equity(25784 [CPSL])	101	-12.3763	-1.063084	-12.358383
Equity(26422 [HXM])	102	-11.9884	-1.029188	-17.329437
Equity(46205 [SFR])	104	-800.8205	-69.960118	-15.552924
Equity(46271 [DRNA])	206	-17.2560	-1.489489	-21.555570
2014-02-27 00:00:00+00:00	Equity(15988 [EMIT_F])	104	-15.2362	-1.312993	-0.232225
Equity(25784 [CPSL])	101	-12.3763	-1.063085	-12.358383
Equity(26422 [HXM])	102	-11.9884	-1.029189	-17.330370
Equity(46205 [SFR])	104	-800.8205	-69.960119	-15.552924
Equity(46271 [DRNA])	206	-17.2560	-1.489490	-21.555570
2014-02-28 00:00:00+00:00	Equity(15988 [EMIT_F])	104	-15.2362	-1.312581	-0.232225
Equity(25784 [CPSL])	101	-12.3763	-1.062749	-12.321792
Equity(26422 [HXM])	102	-11.9884	-1.028864	-17.329224
Equity(46205 [SFR])	104	-800.8205	-69.938728	-15.552924
Equity(46271 [DRNA])	206	-17.2560	-1.489024	-21.538254
2014-03-03 00:00:00+00:00	Equity(15988 [EMIT_F])	104	-15.2362	-1.312444	-0.232225
Equity(25784 [CPSL])	101	-12.3763	-1.062638	-12.303281
Equity(26422 [HXM])	102	-11.9884	-1.028755	-17.329171
Equity(46205 [SFR])	104	-800.8205	-69.931597	-15.552924
Equity(46271 [DRNA])	206	-17.2560	-1.488869	-21.538254

	sector	yield	yield_zscored_masked	yield_zscored_masked_grouped
count	200287.000000	200287.000000	200287.000000	2.002870e+05
mean	201.991567	-0.040724	0.001117	-1.227580e-16
std	93.193610	0.399704	0.995850	1.000002e+00
min	101.000000	-9.631100	-25.752840	-2.372282e+01
25%	103.000000	-0.028400	0.031100	6.659613e-03
50%	206.000000	0.032100	0.181794	1.620947e-01
75%	310.000000	0.058000	0.249094	2.685622e-01
max	311.000000	6.666400	16.335766	1.477042e+01

		excess_returns	market_cap	market_cap_decile	returns
2014-01-02 00:00:00+00:00	Equity(2 [AA])	0.158390	1.027890e+10	8	0.189038
Equity(21 [AAME])	-0.016481	8.480670e+07	1	0.019851
Equity(24 [AAPL])	0.050370	5.003170e+11	9	0.081963
Equity(31 [ABAX])	0.077012	8.013110e+08	5	0.129233
Equity(39 [DDC])	0.019475	1.144070e+09	5	0.071695