The datasets described herein are proprietary to FactSet Research Systems, Inc. ("Factset") and may not be copied or distributed. The datasets made available to Quantopian by FactSet are not exhaustive of FactSet's data, products, software, and/or services.

RBICS Focus: Revenue-Based Sector Classification Data

FactSet’ s Revere Business Industry Classification System (RBICS) is a comprehensive structured taxonomy designed to offer precise classification of global companies. RBICS Focus is a dataset containing single-sector mappings of thousands of the most liquid, publicly traded companies worldwide based on their primary lines of business. It uses revenues as the key factor in determining a company’s primary line of business. On Quantopian, RBICS Focus sectors are available at three levels of granularity:

Level 1: Economy
Level 2: Sector
Level 3: Subsector

RBICS Focus data is available via the Pipeline API, which means it can be accessed in Research and the IDE.

Properties

Coverage: All supported countries on Quantopian
Data Frequency: Daily
Update Frequency: Daily (updated overnight after each trading day)
Timespan: North America - 2004 to present. Start dates for other regions can be found here
Point-in-time start: November 2018
Holdout: 1 year

Data Holdout Period

Like other FactSet-sourced datasets, RBICS Focus has a holdout period. In this case, RBICS Focus has a trailing 1-year holdout. This means that the most recent year of data is not accessible in Research and the IDE. However, submitting an algorithm to the contest that uses RBICS Focus data is allowed and all contest scoring is done using the entire, up-to-date dataset. Similarly, algorithms using RBICS Focus data will be evaluated by Quantopian for funding using the full dataset.

Point-In-Time

RBICS Focus data has been collected and stored in a point-in-time fashion on Quantopian since November 2018. This corresponds to when Quantopian started downloading and storing the data on a nightly basis. RBICS Focus data prior to November 2018 is timestamped with a 1-day delay to emulate the delay that is expected in the point-in-time segment of the data.

Methodology

To overcome disparate and non-standardized company disclosure, FactSet created a normalized global industry classification structure. Standardized industry definitions are applied to companies globally. Only primary sources of information disclosed directly by companies via regulatory filings, investor reports, and company press releases are used. FactSet Analysts are trained to interpret information in a consistent manner and input them into a system with built-in quality and error-checking features.

Data quality is monitored using a combination of system and human quality controls. FactSet utilizes technology such as an internally developed document reader with customizable searching and translation tools to augment the data collection and review efficacy. Ultimately, the information quality results from the patented taxonomy and the well-trained analysts following the methodology yet exercising judgement to ensure collection of material data and proper assimilation of the information.

Usage

The RBICSFocus dataset is a pipeline DataSet. The columns of the RBICSFocus dataset can be used like any other BoundColumn in a pipeline.

Example

This code snippet constructs and runs a pipeline that computes the difference between an asset's 1-week return and the 1-week mean return of all assets with the same economy classification. Note that this example uses Factor.demean to group assets by economy classification and subtract the mean return of the group. Make sure to run it in Research.

from quantopian.pipeline import Pipeline  
from quantopian.pipeline.data.factset import RBICSFocus  
from quantopian.pipeline.domain import US_EQUITIES  
from quantopian.pipeline.factors import Returns  
from quantopian.research import run_pipeline

economy_focus = RBICSFocus.l1_id.latest  
returns_1w = Returns(window_length=6)

returns_1w_less_sector_mean = returns_1w.demean(groupby=economy_focus)

pipe = Pipeline(  
    columns={  
        'economy_focus': economy_focus,  
        'returns_1w_less_sector_mean': returns_1w_less_sector_mean,  
    },  
    domain=US_EQUITIES,  
)

df = run_pipeline(pipe, '2015-05-05', '2016-05-05')  
print(df.head())

The attached notebook provides a similar example as well as some analysis of the RBICS Focus dataset.

Pipeline Datasets and Columns

Dataset

RBICSFocus - The RBICSFocus dataset is a pipeline dataset that provides access to revenue-based sector classifications.

Fields

The RBICSFocus dataset has 7 fields (accessible as BoundColumn attributes):

l1_id (dtype str) - Economy classification code based on business focus.
l1_name (dtype str) - Economy classification name based on business focus.
l2_id (dtype str) - Sector classification code based on business focus.
l2_name (dtype str) - Sector classification name based on business focus.
l3_id (dtype str) - Subsector classification code based on business focus.
l3_name (dtype str) - Subsector classification name based on business focus.
asof_date (dtype datetime64[ns]) - The start date (date when the record first applies) of the classification.

Other Notes

Currently, non-US data is only available in Pipeline in Research. The IDE only has access to US equity data at this time.
This is currently the best documentation for the Equity Metadata dataset. We are working on a new set of documentation that will document all data integrations in one place.