Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
New Data: FactSet Estimates

The datasets described herein are proprietary to FactSet Research Systems, Inc. ("Factset") and may not be copied or distributed. The datasets made available to Quantopian by FactSet are not exhaustive of FactSet's data, products, software, and/or services.

FactSet Estimates

Today, we added Pipeline API support for FactSet Consensus Estimates and FactSet Actuals.

We believe that this release is an exciting opportunity for the Quantopian community. Using the new estimates data, you can create signals and strategies that were not previously possible. For example, you can:

  • Account for current analyst expectations of future earnings.
  • Examine how analyst projections of future earnings have changed over time.
  • Compute earnings surprises by comparing estimated values to company-provided actuals.

Algorithms that use FactSet Estimates data are eligible for entry into the Quantopian Contest. They are also eligible to be considered for allocations.


Background: What Are Estimates and Why Do They Matter?

Publicly traded companies issue quarterly, semi-annual, and annual financial reports. These reports provide investors with quantitative measures of a company's financial performance, including important metrics like Earnings per Share (EPS) and Cash Flow per Share (CFPS).

Investors want to know how companies might perform in the future, so they pay analysts to make estimates of companies' future performance. These estimates are usually tied to a particular fiscal quarter or year, allowing investors to compare estimates with the actual values reported by companies.

For large companies, there are usually many analysts making estimates at a given time for any given fiscal period. A common method for working with estimates data is to aggregate the estimates from individual analysts into a single consensus estimate. There are many ways to summarize per-analyst estimates into a consensus value, but commonly-computed statistics include mean, median, high, low, the number of estimates, and standard deviation.

When working with estimates data in a simulation (e.g. in a backtest or a pipeline), we usually think about estimate periods relative to the current simulation perspective. A common notation for talking about relative periods is to use "FQ1" for "next-to-be-announced quarter", "FQ2" for "two quarters out", and "FQ0" for "most-recently-announced quarter". Similarly, "FY1" means "next-to-be-announced year", and "FY0" means "most-recently-announced year".


API Additions

This release includes a few additions to the Quantopian API.

We've added a new DataSetFamily class to the Pipeline API. A DataSetFamily is a collection of DataSet objects that all have the same columns. You can get a DataSet from a DataSetFamily by calling the family's .slice method. Our first two new dataset families are PeriodicConsensus and Actuals, both importable from quantopian.pipeline.data.factset.estimates.

New Module: quantopian.pipeline.data.factset.estimates

We've added a new estimates submodule to the existing factset module. The estimates submodule has two public attributes: PeriodicConsensus and Actuals.

The structure of quantopian.pipeline.data.factset is now:

- quantopian.pipeline.data.factset (module)  
  - EquityMetadata (DataSet)  
  - Fundamentals (DataSet)  
  - RBICSFocus (DataSet)  
  - estimates (module)  
    - PeriodicConsensus (DataSetFamily)  
    - Actuals (DataSetFamily)  

NOTE: This is the first time we've added a submodule to the top-level factset module. For more information on the reason behind this decision, see the notebook attached below.

New Concept: DataSet Families

Before today, when we added new data to the Pipeline API, we did so by adding one or more new DataSets. Each new DataSet was a simple collection of named "columns" and you could access columns by name with expressions like EquityPricing.close or RBICSFocus.l1_id.

Representing datasets as collections of named columns works well for simple tables with only a handful of columns, but for more complex datasets we need a more expressive model. As we worked on integrating FactSet Estimates, we found that the existing DataSet API has two important shortcomings that we wanted to address:

  1. DataSet doesn't provide a way to group related columns. (1)
  2. DataSet doesn't provide a way to programmatically select columns.

To solve these problems, we've introduced a new kind of object to the Pipeline API: DataSetFamily.

A DataSetFamily is like a collection of DataSets, where each member dataset has the same columns. Each member of a family is identified by a tuple of named attributes, which we call its coordinates. To select a member from a dataset family, you call the family's .slice method, passing the coordinates of the desired member.

For example, to select the dataset containing data for FQ1 earnings per share, you would write:

from quantopian.pipeline.data.factset.estimates import PeriodicConsensus

# PeriodicConsensus is a DataSetFamily, fq1_eps is a DataSet.  
fq1_eps = PeriodicConsensus.slice(item='EPS', freq='qf', offset=1)  

You can also pass slice coordinates positionally:

fq1_eps = PeriodicConsensus.slice('EPS', 'qf', 1)  

For more examples and details on the design of DataSetFamily, see the attached notebook.

New Dataset Families: PeriodicConsensus and Actuals

Our first two dataset families are PeriodicConsensus and Actuals, both importable from quantopian.pipeline.data.factset.estimates. The PeriodicConsensus family provides aggregated consensus estimates from industry analysts. The Actuals family provides company-reported values for estimated fields, organized in a way that makes it easy to compare with consensus estimates.

PeriodicConsensus and Actuals are dataset families, so they need to be sliced in order to produce a usable dataset. Slices from both PeriodicConsensus and Actuals have three coordinates:

  • Item: The estimated/reported company metric, e.g. 'EPS', 'CFPS'. See the Data Reference for the full set of items you can choose from.
  • Frequency: The reporting frequency of the estimate or actual. Choices are 'qf' (quarterly), 'saf' (semi-annual), and 'af' (annual). Note that quarterly and annual reports have many more estimates than semi-annual ones.
  • Period Offset: The relative offset between the current simulation time and the estimated/reported period. A period offset of 1 means "next period to be reported". A period offset of 0 means "most recently reported period". PeriodicConsensus can be sliced with positive offsets (resulting in datasets containing forward-looking estimates) or with zero or negative offsets (resulting in historical estimates for already-announced periods). Actuals only supports slicing with offsets of less than or equal to 0.

To learn more about PeriodicConsensus and Actuals, see the Data Reference:


Contest & Allocations

Consensus Estimates and Actuals are both available in backtesting, so you are able to use them in the contest. FactSet estimates data is the first of its kind on Quantopian. Strategies written using this dataset are likely to be uncorrelated to strategies that are already running in the contest. For this reason, we heavily encourage you to familiarize yourself with this dataset and try to enter estimates-based strategies in the contest. To further incentivize that, we are increasing the limit on the number of contest entries to 5 per person so that you can make a new entry with FactSet Estimates without having to withdraw one of your existing entries. Algorithms that use FactSet Estimates are eligible to be considered for an allocation.

Attached is an example notebook that uses FactSet Estimates in a pipeline to help get you started.

Data Holdout Period

Like other FactSet-sourced datasets, FactSet Estimates has a holdout period. In this case, FactSet Estimates has a trailing 1-year holdout. This means that the most recent year of data is not accessible in Research and the IDE. However, submitting an algorithm to the contest that uses FactSet estimates data is allowed and all contest scoring is done using the entire, up-to-date dataset. Similarly, algorithms using FactSet estimates data will be evaluated by Quantopian for funding using the full dataset.

To learn more about FactSet Estimates on Quantopian see the newly published Data Reference.


Example

This example constructs and runs a pipeline that computes an earnings per share (EPS) surprise factor. The surprise factor is defined to be the percent difference between the estimated EPS and the actual EPS from the most recently published quarterly report (FQ0). Note that the example uses the ConsensusEstimates dataset with the Actuals dataset to construct the surprise factor. The example should be run in Research.

from quantopian.pipeline import Pipeline  
import quantopian.pipeline.data.factset.estimates as fe  
from quantopian.pipeline.domain import US_EQUITIES  
from quantopian.research import run_pipeline

# Slice the PeriodicConensus and Actuals DataSetFamilies into DataSets. In this context,  
# fq0_eps_cons is a DataSet containing consensus estimates data about EPS for the  
# most recently reported fiscal quarter. fq0_eps_act is a DataSet containing the actual  
# reported EPS for the most recently reported quarter.  
fq0_eps_cons = fe.PeriodicConsensus.slice('EPS', 'qf', 0)  
fq0_eps_act = fe.Actuals.slice('EPS', 'qf', 0)

# Get the latest mean consensus EPS estimate for the last reported quarter.  
fq0_eps_cons_mean = fq0_eps_cons.mean.latest

# Get the EPS value from the last reported quarter.  
fq0_eps_act_value = fq0_eps_act.actual_value.latest

# Define a surprise factor to be the relative difference between the estimated and  
# reported EPS.  
fq0_surprise = (fq0_eps_act_value - fq0_eps_cons_mean) / fq0_eps_cons_mean

# Add the surprise factor to the pipeline.  
pipe = Pipeline(  
    columns={  
        'eps_surprise_factor': fq0_surprise,  
    },  
    domain=US_EQUITIES,  
    screen=fq0_surprise.notnull(),  
)

# Run the pipeline over a year and print the result.  
df = run_pipeline(pipe, '2015-05-05', '2016-05-05')  
print(df.head())  

Footnotes

(1) These issues are most visible today in the FactSet Fundamentals dataset. Fundamentals has over 1000 "columns", but many of those columns are minor variations of one another. For example, every fundamental field has both a main column and an _asof_date-suffixed version. We also have separate _qf, _saf and _af columns for quarterly, semi-annually, and annually-reported versions of many metrics.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

8 responses

The attached notebook has several example pipelines that use FactSet Estimates data in different ways. This notebook is a good follow-up to the notebook attached to the original post.

It seems like earnings estimate on the first date of pipeline is incorrect. I got different results of estimates after changing the ending date of run_pipeline.

@David: That appears to be a bug. Thank you for bringing it to our attention. We're investigating the issue and we will provide an update when we have more information.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks @Jamie, once you know more could you provide an impact (severity) assessment of existing backtests? I.e is the impact bad enough that we would need to rerun backtests on this dataset (once it’s been fixed) to get an accurate picture of the strategy?

@Joakim it looks like the impact of the bug is that the first day in any pipeline execution 'chunk' is likely to have incorrect data. If you're unfamiliar with pipeline execution chunks, pipelines usually get executed in long chunks spanning many simulation dates in order to speed up computation times. In backtesting, the default is for pipelines to be executed in 6-month chunks (which is why you might see a backtest pause ~every 6 months, and then rapidly draw out the next 6 months of results. Because of this, I expect that roughly 1 simulation day out of every 6 months might change significantly when we issue a fix. I'd recommend re-running backtests because it's hard to say how much of a change your simulations will realize.

Currently, we have a fix working on our staging environment that seems to be working. I'll post an update on this thread once Research and Backtesting are updated with the fix.

We pushed an update that fixes the vast majority of instances of the bug described above. There are still a couple of edge cases that aren't working, but they are primarily stocks with low liquidity that won't be in the QTU. We're working on fixing these issues and expect to push a second update soon, but in the meantime, I'd recommend re-running your previous backtests with the current version to see if they were materially impacted by the change.

Thanks Jamie, appreciated!

The last part of the fix has shipped, so all pipelines using estimates data should be deterministic. The result for any simulation date should be the same, regardless of the start and end dates of run_pipeline. Thanks again to @David for reporting the issue!