Notebook

RBICS Focus

FactSet’ s Revere Business Industry Classification System (RBICS) is a comprehensive structured taxonomy designed to offer precise classification of global companies. RBICS Focus is a dataset containing single-sector mappings of thousands of the most liquid, publicly traded companies worldwide based on their primary lines of business. It uses revenues as the key factor in determining a company’s primary line of business. On Quantopian, RBICS Focus sectors are available at three levels of granularity:

  • Level 1: Economy
  • Level 2: Sector
  • Level 3: Subsector

RBICS Focus data is available via the Pipeline API, which means it can be accessed in Research and the IDE.

Dataset Overview

The RBICS Focus dataset has 7 fields (accessible as BoundColumn attributes):

  • l1_id (dtype str): Economy classification code based on business focus.
  • l1_name (dtype str): Economy classification name based on business focus.
  • l2_id (dtype str): Sector classification code based on business focus.
  • l2_name (dtype str): Sector classification name based on business focus.
  • l3_id (dtype str): Subsector classification code based on business focus.
  • l3_name (dtype str): Subsector classification name based on business focus.
  • asof_date (dtype datetime64[ns]): The start date (date when the record first applies) of the classification.

The following cell constructs and runs a pipeline that gets the latest value for all available fields in the RBICS Focus dataset.

In [1]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset import RBICSFocus
from quantopian.pipeline.domain import US_EQUITIES
from quantopian.research import run_pipeline

pipe = Pipeline(
    columns={
        'l1_name': RBICSFocus.l1_name.latest,
        'l2_name': RBICSFocus.l2_name.latest,
        'l3_name': RBICSFocus.l3_name.latest,
        'l1_id': RBICSFocus.l1_id.latest,
        'l2_id': RBICSFocus.l2_id.latest,
        'l3_id': RBICSFocus.l3_id.latest,
        'asof_date': RBICSFocus.asof_date.latest,
    },
    domain=US_EQUITIES,
)

# Reminder: there is a trailing 1-year holdout on this dataset.
df = run_pipeline(pipe, '2016-05-01', '2017-05-05')
In [2]:
df.head()
Out[2]:
asof_date l1_id l1_name l2_id l2_name l3_id l3_name
2016-05-02 00:00:00+00:00 Equity(2 [ARNC]) 2003-05-23 45 Non-Energy Materials 4515 Mining and Mineral Products 451510 Metal Products
Equity(21 [AAME]) 2012-07-06 30 Finance 3015 Insurance 301510 Insurance
Equity(24 [AAPL]) 2008-08-25 55 Technology 5515 Hardware 551520 Computer Hardware and Storage
Equity(25 [ARNC_PR]) NaT None None None None None None
Equity(31 [ABAX]) 2009-06-30 35 Healthcare 3515 Healthcare Services 351520 Miscellaneous Healthcare

Universe Segmentation Example

The following cell constructs and runs a pipeline of Canadian equities that screens equities down to a tradable universe (using Equity Metadata). It then ranks equities according to their 1-week return relative to the mean return of their RBICS economy classification (level 1 classification).

In [3]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset import EquityMetadata, Fundamentals, RBICSFocus
from quantopian.pipeline.factors import Returns
from quantopian.pipeline.domain import CA_EQUITIES
from quantopian.research import run_pipeline

# Create a latest market cap factor.
mcap = Fundamentals.mkt_val.latest

# Create a pipeline filter for 'tradable' stocks.
is_tradable = (
    EquityMetadata.security_type.latest.eq('SHARE') 
    & EquityMetadata.is_primary.latest
)

# Create a base universe filter that selects the top 50% of our 'tradable'  
# equities based on market cap.
base_universe = mcap.percentile_between(50, 100, mask=is_tradable)

# RBICS 'economy' classification.
rbics_economy = RBICSFocus.l1_name.latest

# 1-week returns factor.
returns_1w = Returns(window_length=6)

# 1-week returns minus the economy mean 1-week return.
returns_less_economy_mean = returns_1w.demean(groupby=rbics_economy, mask=base_universe)

# Build a pipeline over the Canadian equities domain and screen down to
# a set of stocks that pass our base_universe filter.
pipe = Pipeline(
    columns={
        'rbics_economy': rbics_economy,
        'returns_1w': returns_1w,
        'returns_less_economy_mean': returns_less_economy_mean,
        
        # Adding this column to display the economy mean return.
        'economy_mean_return_1w': returns_1w - returns_less_economy_mean,
    },
    domain=CA_EQUITIES,
    screen=base_universe,
)

df = run_pipeline(pipe, '2015-05-05', '2017-05-05')
In [4]:
df.head()
Out[4]:
economy_mean_return_1w rbics_economy returns_1w returns_less_economy_mean
2015-05-05 00:00:00+00:00 Equity(1178883868150594 [KAR]) 0.023086 Non-Energy Materials 0.018657 -0.004430
Equity(1178892628414550 [CET]) 0.022729 Energy -0.012766 -0.035495
Equity(1178896755081796 [TMB]) 0.023086 Non-Energy Materials -0.089286 -0.112372
Equity(1178900628985687 [MIO.H]) 0.023086 Non-Energy Materials 0.000000 -0.023086
Equity(1178900948861510 [CGLD]) 0.023086 Non-Energy Materials -0.043478 -0.066565

What Labels Exist at Each Level?

Economy (level 1) labels in the QTradableStocksUS (US-only):

In [5]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset import RBICSFocus
from quantopian.pipeline.domain import US_EQUITIES
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.research import run_pipeline

pipe = Pipeline(
    columns={
        'l1_name': RBICSFocus.l1_name.latest,
        'l2_name': RBICSFocus.l2_name.latest,
        'l3_name': RBICSFocus.l3_name.latest,
    },
    domain=US_EQUITIES,
    screen=QTradableStocksUS(),
)

df = run_pipeline(pipe, '2015-05-05', '2017-05-05')
In [6]:
df.head()
Out[6]:
l1_name l2_name l3_name
2015-05-05 00:00:00+00:00 Equity(2 [ARNC]) Non-Energy Materials Mining and Mineral Products Metal Products
Equity(24 [AAPL]) Technology Hardware Computer Hardware and Storage
Equity(31 [ABAX]) Healthcare Healthcare Services Miscellaneous Healthcare
Equity(39 [DDC]) None None None
Equity(41 [ARCB]) Industrials Industrial Services Cargo Transportation and Infrastructure Services
In [7]:
import numpy as np
In [8]:
economy_counts = df.loc[('2017-01-05', slice(None)), :].groupby('l1_name').size().drop(np.nan)

print 'Number of RBICS economy labels in the QTU on 01/05/2017:'
print len(economy_counts)
print ''

print 'RBICS economy label frequencies in the QTU on 01/05/2017:'
economy_counts.sort_values(ascending=False)
Number of RBICS economy labels in the QTU on 01/05/2017:
13

RBICS economy label frequencies in the QTU on 01/05/2017:
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py:2193: FutureWarning: 
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
  ordered=self.grouper.ordered))
/usr/local/lib/python2.7/dist-packages/pandas/indexes/category.py:121: FutureWarning: 
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
  data = data.set_categories(categories)
/usr/local/lib/python2.7/dist-packages/pandas/indexes/category.py:96: FutureWarning: 
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
  ordered=self.ordered)
/usr/local/lib/python2.7/dist-packages/pandas/indexes/category.py:118: FutureWarning: 
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
  data = Categorical(data, categories=categories, ordered=ordered)
Out[8]:
l1_name
Finance                   425
Healthcare                263
Technology                220
Industrials               211
Non-Energy Materials      156
Consumer Cyclicals        129
Consumer Non-Cyclicals    124
Energy                    121
Consumer Services          94
Business Services          61
Utilities                  60
Telecommunications         32
Other                       0
dtype: int64
In [9]:
sector_counts = df.loc[('2017-01-05', slice(None)), :].groupby('l2_name').size().drop(np.nan)

print 'Number of RBICS sector labels in the QTU on 01/05/2017:'
print len(sector_counts)
print ''

print 'RBICS sector label frequencies in the QTU on 01/05/2017:'
sector_counts.sort_values(ascending=False)
Number of RBICS sector labels in the QTU on 01/05/2017:
32

RBICS sector label frequencies in the QTU on 01/05/2017:
Out[9]:
l2_name
Real Estate                                167
Industrial Manufacturing                   131
Biopharmaceuticals                         115
Software and Consulting                    111
Upstream Energy                            100
Banking                                     96
Healthcare Services                         84
Mining and Mineral Products                 82
Industrial Services                         80
Hospitality Services                        69
Healthcare Equipment                        64
Business Services                           61
Utilities                                   60
Electronic Components and Manufacturing     59
Investment Services                         59
Insurance                                   58
Food and Tobacco Production                 53
Chemical, Plastic and Rubber Materials      53
Hardware                                    50
Specialty Finance and Services              45
Consumer Retail                             44
Consumer Goods                              36
Food and Staples Retail                     34
Telecommunications                          32
Consumer Vehicles and Parts                 27
Household Products                          26
Media and Publishing Services               25
Miscellaneous Retail                        22
Downstream and Midstream Energy             21
Manufactured Products                       21
Household Services                          11
General or Multi-Industry Revenue            0
dtype: int64
In [10]:
subsector_counts = df.loc[('2017-01-05', slice(None)), :].groupby('l3_name').size().drop(np.nan)

print 'Number of RBICS subsector labels in the QTU on 01/05/2017:'
print len(subsector_counts)
print ''

print 'RBICS subsector label frequencies in the QTU on 01/05/2017:'
subsector_counts.sort_values(ascending=False)
Number of RBICS subsector labels in the QTU on 01/05/2017:
90

RBICS subsector label frequencies in the QTU on 01/05/2017:
Out[10]:
l3_name
Real Estate Investment Trusts (REITs)               147
United States Banks                                  84
Machinery Manufacturing                              71
Hospitality Services                                 69
Software                                             64
Fossil Fuel Exploration and Production               60
Investment Services                                  59
System-Specific Biopharmaceuticals                   59
Insurance                                            58
Energy Utilities                                     56
Specialty and Performance Chemicals                  43
Other Professional Services                          39
Specialty Finance                                    38
Food and Beverage Production                         38
Support Activities for Oil and Gas Operations        36
Internet and Data Services                           35
Aerospace and Defense Manufacturing                  35
Non-System-Specific Biopharmaceuticals               31
Metal Ore Mining                                     31
Semiconductor Manufacturing                          28
Other Medical Devices                                27
Healthcare Support Services                          27
Consumer Vehicles and Parts                          27
Metal Products                                       26
Cargo Transportation and Infrastructure Services     26
Other Biopharmaceuticals                             25
Media and Publishing Services                        25
Patient Care                                         25
General Medical Devices                              25
Apparel and Accessory Products                       22
                                                   ... 
Food and Beverage Retail                              9
Electronics and Entertainment Retail                  9
Passenger Transportation                              9
Leisure Goods Products                                8
Mortgage Banking                                      7
Business Support Services                             7
Information Technology Distribution                   7
Finance Software and Services                         7
Real Estate Investment and Services                   6
Manufacturing Equipment and Services                  6
Household Appliances and Furnishings Production       6
Health and Personal Care Retail                       6
International Banks                                   5
Forestry and Paper Products                           5
Delivery and Logistics Services                       5
Waste Management Services                             5
Tobacco Production                                    5
Other Retail                                          5
Water Utilities                                       4
Household Appliances and Tools                        4
Coal and Uranium Mining                               4
Minerals                                              3
Consumer Electronics                                  2
Electronic Equipment Manufacturing                    1
Other Materials                                       1
Plastic and Rubber Products                           1
Other Mining                                          1
Mixed Metal and Mineral Products                      1
Mixed Chemicals                                       0
General or Multi-Industry Revenue                     0
dtype: int64

Usable In the Contest

RBICS Focus data is available in pipelines in the IDE (US_EQUITIES domain only) as well as being available in Research. Therefore, algorithms that use RBICS Focus data are eligible for the contest and will be considered for an allocation.

The RBICS Focus dataset can be used in many situations like constructing a tradable universe, segmenting a universe, or even creating sector-specific factors. The dataset covers all global equities supported on Quantopian. Try exploring the data and see if you can come up with ideas to include it in existing strategies or new strategies altogether!