The datasets described herein are proprietary to FactSet Research Systems, Inc. ("Factset") and may not be copied or distributed. The datasets made available to Quantopian by FactSet are not exhaustive of FactSet's data, products, software, and/or services.
FactSet Estimates
Today, we added Pipeline API support for FactSet Consensus Estimates and FactSet Actuals.
We believe that this release is an exciting opportunity for the Quantopian community. Using the new estimates data, you can create signals and strategies that were not previously possible. For example, you can:
- Account for current analyst expectations of future earnings.
- Examine how analyst projections of future earnings have changed over time.
- Compute earnings surprises by comparing estimated values to company-provided actuals.
Algorithms that use FactSet Estimates data are eligible for entry into the Quantopian Contest. They are also eligible to be considered for allocations.
Background: What Are Estimates and Why Do They Matter?
Publicly traded companies issue quarterly, semi-annual, and annual financial reports. These reports provide investors with quantitative measures of a company's financial performance, including important metrics like Earnings per Share (EPS) and Cash Flow per Share (CFPS).
Investors want to know how companies might perform in the future, so they pay analysts to make estimates of companies' future performance. These estimates are usually tied to a particular fiscal quarter or year, allowing investors to compare estimates with the actual values reported by companies.
For large companies, there are usually many analysts making estimates at a given time for any given fiscal period. A common method for working with estimates data is to aggregate the estimates from individual analysts into a single consensus estimate. There are many ways to summarize per-analyst estimates into a consensus value, but commonly-computed statistics include mean, median, high, low, the number of estimates, and standard deviation.
When working with estimates data in a simulation (e.g. in a backtest or a pipeline), we usually think about estimate periods relative to the current simulation perspective. A common notation for talking about relative periods is to use "FQ1" for "next-to-be-announced quarter", "FQ2" for "two quarters out", and "FQ0" for "most-recently-announced quarter". Similarly, "FY1" means "next-to-be-announced year", and "FY0" means "most-recently-announced year".
API Additions
This release includes a few additions to the Quantopian API.
We've added a new DataSetFamily
class to the Pipeline API. A DataSetFamily
is a collection of DataSet
objects that all have the same columns. You can get a DataSet
from a DataSetFamily
by calling the family's .slice
method. Our first two new dataset families are PeriodicConsensus
and Actuals
, both importable from quantopian.pipeline.data.factset.estimates
.
New Module: quantopian.pipeline.data.factset.estimates
We've added a new estimates
submodule to the existing factset
module. The estimates
submodule has two public attributes: PeriodicConsensus
and Actuals
.
The structure of quantopian.pipeline.data.factset
is now:
- quantopian.pipeline.data.factset (module)
- EquityMetadata (DataSet)
- Fundamentals (DataSet)
- RBICSFocus (DataSet)
- estimates (module)
- PeriodicConsensus (DataSetFamily)
- Actuals (DataSetFamily)
NOTE: This is the first time we've added a submodule to the top-level factset
module. For more information on the reason behind this decision, see the notebook attached below.
New Concept: DataSet Families
Before today, when we added new data to the Pipeline API, we did so by adding one or more new DataSet
s. Each new DataSet
was a simple collection of named "columns" and you could access columns by name with expressions like EquityPricing.close
or RBICSFocus.l1_id
.
Representing datasets as collections of named columns works well for simple tables with only a handful of columns, but for more complex datasets we need a more expressive model. As we worked on integrating FactSet Estimates, we found that the existing DataSet
API has two important shortcomings that we wanted to address:
DataSet
doesn't provide a way to group related columns. (1)DataSet
doesn't provide a way to programmatically select columns.
To solve these problems, we've introduced a new kind of object to the Pipeline API: DataSetFamily
.
A DataSetFamily
is like a collection of DataSet
s, where each member dataset has the same columns. Each member of a family is identified by a tuple of named attributes, which we call its coordinates. To select a member from a dataset family, you call the family's .slice
method, passing the coordinates of the desired member.
For example, to select the dataset containing data for FQ1 earnings per share, you would write:
from quantopian.pipeline.data.factset.estimates import PeriodicConsensus
# PeriodicConsensus is a DataSetFamily, fq1_eps is a DataSet.
fq1_eps = PeriodicConsensus.slice(item='EPS', freq='qf', offset=1)
You can also pass slice coordinates positionally:
fq1_eps = PeriodicConsensus.slice('EPS', 'qf', 1)
For more examples and details on the design of DataSetFamily
, see the attached notebook.
New Dataset Families: PeriodicConsensus
and Actuals
Our first two dataset families are PeriodicConsensus
and Actuals
, both importable from quantopian.pipeline.data.factset.estimates
. The PeriodicConsensus
family provides aggregated consensus estimates from industry analysts. The Actuals
family provides company-reported values for estimated fields, organized in a way that makes it easy to compare with consensus estimates.
PeriodicConsensus
and Actuals
are dataset families, so they need to be sliced in order to produce a usable dataset. Slices from both PeriodicConsensus
and Actuals
have three coordinates:
- Item: The estimated/reported company metric, e.g.
'EPS'
,'CFPS'
. See the Data Reference for the full set of items you can choose from. - Frequency: The reporting frequency of the estimate or actual. Choices are
'qf'
(quarterly),'saf'
(semi-annual), and'af'
(annual). Note that quarterly and annual reports have many more estimates than semi-annual ones. - Period Offset: The relative offset between the current simulation time and the estimated/reported period. A period offset of 1 means "next period to be reported". A period offset of 0 means "most recently reported period".
PeriodicConsensus
can be sliced with positive offsets (resulting in datasets containing forward-looking estimates) or with zero or negative offsets (resulting in historical estimates for already-announced periods).Actuals
only supports slicing with offsets of less than or equal to 0.
To learn more about PeriodicConsensus
and Actuals
, see the Data Reference:
Contest & Allocations
Consensus Estimates and Actuals are both available in backtesting, so you are able to use them in the contest. FactSet estimates data is the first of its kind on Quantopian. Strategies written using this dataset are likely to be uncorrelated to strategies that are already running in the contest. For this reason, we heavily encourage you to familiarize yourself with this dataset and try to enter estimates-based strategies in the contest. To further incentivize that, we are increasing the limit on the number of contest entries to 5 per person so that you can make a new entry with FactSet Estimates without having to withdraw one of your existing entries. Algorithms that use FactSet Estimates are eligible to be considered for an allocation.
Attached is an example notebook that uses FactSet Estimates in a pipeline to help get you started.
Data Holdout Period
Like other FactSet-sourced datasets, FactSet Estimates has a holdout period. In this case, FactSet Estimates has a trailing 1-year holdout. This means that the most recent year of data is not accessible in Research and the IDE. However, submitting an algorithm to the contest that uses FactSet estimates data is allowed and all contest scoring is done using the entire, up-to-date dataset. Similarly, algorithms using FactSet estimates data will be evaluated by Quantopian for funding using the full dataset.
To learn more about FactSet Estimates on Quantopian see the newly published Data Reference.
Example
This example constructs and runs a pipeline that computes an earnings per share (EPS) surprise factor. The surprise factor is defined to be the percent difference between the estimated EPS and the actual EPS from the most recently published quarterly report (FQ0). Note that the example uses the ConsensusEstimates
dataset with the Actuals
dataset to construct the surprise factor. The example should be run in Research.
from quantopian.pipeline import Pipeline
import quantopian.pipeline.data.factset.estimates as fe
from quantopian.pipeline.domain import US_EQUITIES
from quantopian.research import run_pipeline
# Slice the PeriodicConensus and Actuals DataSetFamilies into DataSets. In this context,
# fq0_eps_cons is a DataSet containing consensus estimates data about EPS for the
# most recently reported fiscal quarter. fq0_eps_act is a DataSet containing the actual
# reported EPS for the most recently reported quarter.
fq0_eps_cons = fe.PeriodicConsensus.slice('EPS', 'qf', 0)
fq0_eps_act = fe.Actuals.slice('EPS', 'qf', 0)
# Get the latest mean consensus EPS estimate for the last reported quarter.
fq0_eps_cons_mean = fq0_eps_cons.mean.latest
# Get the EPS value from the last reported quarter.
fq0_eps_act_value = fq0_eps_act.actual_value.latest
# Define a surprise factor to be the relative difference between the estimated and
# reported EPS.
fq0_surprise = (fq0_eps_act_value - fq0_eps_cons_mean) / fq0_eps_cons_mean
# Add the surprise factor to the pipeline.
pipe = Pipeline(
columns={
'eps_surprise_factor': fq0_surprise,
},
domain=US_EQUITIES,
screen=fq0_surprise.notnull(),
)
# Run the pipeline over a year and print the result.
df = run_pipeline(pipe, '2015-05-05', '2016-05-05')
print(df.head())
Footnotes
(1) These issues are most visible today in the FactSet Fundamentals dataset. Fundamentals has over 1000 "columns", but many of those columns are minor variations of one another. For example, every fundamental field has both a main column and an _asof_date
-suffixed version. We also have separate _qf
, _saf
and _af
columns for quarterly, semi-annually, and annually-reported versions of many metrics.