The datasets described herein are proprietary to FactSet Research Systems, Inc. ("Factset") and may not be copied or distributed. The datasets made available to Quantopian by FactSet are not exhaustive of FactSet's data, products, software, and/or services.
RBICS Focus: Revenue-Based Sector Classification Data
FactSet’ s Revere Business Industry Classification System (RBICS) is a comprehensive structured taxonomy designed to offer precise classification of global companies. RBICS Focus is a dataset containing single-sector mappings of thousands of the most liquid, publicly traded companies worldwide based on their primary lines of business. It uses revenues as the key factor in determining a company’s primary line of business. On Quantopian, RBICS Focus sectors are available at three levels of granularity:
- Level 1: Economy
- Level 2: Sector
- Level 3: Subsector
RBICS Focus data is available via the Pipeline API, which means it can be accessed in Research and the IDE.
Properties
- Coverage: All supported countries on Quantopian
- Data Frequency: Daily
- Update Frequency: Daily (updated overnight after each trading day)
- Timespan: North America - 2004 to present. Start dates for other regions can be found here
- Point-in-time start: November 2018
- Holdout: 1 year
Data Holdout Period
Like other FactSet-sourced datasets, RBICS Focus has a holdout period. In this case, RBICS Focus has a trailing 1-year holdout. This means that the most recent year of data is not accessible in Research and the IDE. However, submitting an algorithm to the contest that uses RBICS Focus data is allowed and all contest scoring is done using the entire, up-to-date dataset. Similarly, algorithms using RBICS Focus data will be evaluated by Quantopian for funding using the full dataset.
Point-In-Time
RBICS Focus data has been collected and stored in a point-in-time fashion on Quantopian since November 2018. This corresponds to when Quantopian started downloading and storing the data on a nightly basis. RBICS Focus data prior to November 2018 is timestamped with a 1-day delay to emulate the delay that is expected in the point-in-time segment of the data.
Methodology
To overcome disparate and non-standardized company disclosure, FactSet created a normalized global industry classification structure. Standardized industry definitions are applied to companies globally. Only primary sources of information disclosed directly by companies via regulatory filings, investor reports, and company press releases are used. FactSet Analysts are trained to interpret information in a consistent manner and input them into a system with built-in quality and error-checking features.
Data quality is monitored using a combination of system and human quality controls. FactSet utilizes technology such as an internally developed document reader with customizable searching and translation tools to augment the data collection and review efficacy. Ultimately, the information quality results from the patented taxonomy and the well-trained analysts following the methodology yet exercising judgement to ensure collection of material data and proper assimilation of the information.
Usage
The RBICSFocus
dataset is a pipeline DataSet
. The columns of the RBICSFocus
dataset can be used like any other BoundColumn
in a pipeline.
Example
This code snippet constructs and runs a pipeline that computes the difference between an asset's 1-week return and the 1-week mean return of all assets with the same economy classification. Note that this example uses Factor.demean
to group assets by economy classification and subtract the mean return of the group. Make sure to run it in Research.
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset import RBICSFocus
from quantopian.pipeline.domain import US_EQUITIES
from quantopian.pipeline.factors import Returns
from quantopian.research import run_pipeline
economy_focus = RBICSFocus.l1_id.latest
returns_1w = Returns(window_length=6)
returns_1w_less_sector_mean = returns_1w.demean(groupby=economy_focus)
pipe = Pipeline(
columns={
'economy_focus': economy_focus,
'returns_1w_less_sector_mean': returns_1w_less_sector_mean,
},
domain=US_EQUITIES,
)
df = run_pipeline(pipe, '2015-05-05', '2016-05-05')
print(df.head())
The attached notebook provides a similar example as well as some analysis of the RBICS Focus dataset.
Pipeline Datasets and Columns
Dataset
RBICSFocus
- The RBICSFocus
dataset is a pipeline dataset that provides access to revenue-based sector classifications.
Fields
The RBICSFocus
dataset has 7 fields (accessible as BoundColumn
attributes):
l1_id
(dtypestr
) - Economy classification code based on business focus.l1_name
(dtypestr
) - Economy classification name based on business focus.l2_id
(dtypestr
) - Sector classification code based on business focus.l2_name
(dtypestr
) - Sector classification name based on business focus.l3_id
(dtypestr
) - Subsector classification code based on business focus.l3_name
(dtypestr
) - Subsector classification name based on business focus.asof_date
(dtypedatetime64[ns]
) - The start date (date when the record first applies) of the classification.
Other Notes
- Currently, non-US data is only available in Pipeline in Research. The IDE only has access to US equity data at this time.
- This is currently the best documentation for the Equity Metadata dataset. We are working on a new set of documentation that will document all data integrations in one place.