Today, we announced the addition of global equity pricing and fundamentals data to Quantopian. With the addition of international equity data comes a new Pipeline API feature called domains. In this notebook, we introduce the concept of domains and explain how they work with examples.
EquityPricing
dataset instead of USEquityPricing
for Pipelines that you want to run on
non-US domains.The Pipeline API is a tool that allows you to define computations over a universe of assets and a period of time. In the context of the quant equity workflow, you can use the Pipeline API to construct a tradable universe and to compute alpha factors.
The computations defined by the Pipeline API operate on two-dimensional tables of financial data. These tables contain a row for each trading day of a particular market, and they usually contain a column for every asset on that market. For example:
date | AAPL | ARCN | ... | MSFT | TSLA |
---|---|---|---|---|---|
2014-01-02 | 5.5 | 2.0 | ... | 5.5 | 4.23 |
2014-01-03 | 5.6 | 2.1 | ... | 5.6 | 4.18 |
... | 5.7 | 2.2 | ... | 5.7 | 3.96 |
2014-12-31 | 5.8 | 2.3 | ... | 5.8 | 4.25 |
Until today, Quantopian only provided US Equity data, so the rows of pipeline inputs were always implicitly aligned to the US trading calendar (specifically, the NYSE calendar), and the columns of pipeline inputs always corresponded to US equities.
With the addition of global markets to the Quantopian platform, we needed to extend the Pipeline API in two ways:
Domains are a new Pipeline API feature designed to satisfy both of these needs.
Domains are a new kind of object in the Pipeline API. You can pass a domain when constructing a pipeline to change the inputs that will be processed by the pipeline. Concretely, the domain of a pipeline controls three things::
There are currently 21 domains available for use on the Quantopian platform, corresponding to the 21 countries supported by this release. Each new domain has two components:
For the mathematically-inclined, the name "domain" refers to the mathematical concept of the domain of a function, which is the set of potential inputs to a function. Though domains currently only control
For more information about the design of domains, see the public design document on GitHub.
Domains are regular Python objects. The currently-supported domains are importable from quantopian.pipeline.domain
.
Each country's domain is named **_EQUITIES
, with **
replaced by the country's ISO 3166 country code.
from quantopian.pipeline.domain import (
AT_EQUITIES, # Austria
AU_EQUITIES, # Australia
BE_EQUITIES, # Belgium
CA_EQUITIES, # Canada
CH_EQUITIES, # Switzerland
DE_EQUITIES, # Germany
DK_EQUITIES, # Denmark
ES_EQUITIES, # Spain
FI_EQUITIES, # Finland
FR_EQUITIES, # France
GB_EQUITIES, # Great Britain
HK_EQUITIES, # Hong Kong
IE_EQUITIES, # Ireland
IT_EQUITIES, # Italy
JP_EQUITIES, # Japan
NL_EQUITIES, # Netherlands
NO_EQUITIES, # Norway
NZ_EQUITIES, # New Zealand
PT_EQUITIES, # Portugal
SE_EQUITIES, # Sweden
US_EQUITIES, # United States
)
# The string representation for each domain shows the ISO code for its country and for the exchange
# that defines its calendar.
US_EQUITIES
The domain of a pipeline defines the calendar and assets to use as input to each of its computations. A good way to see what this looks like is to run an empty pipeline on different domains.
To specify the domain of a pipeline, we pass domain
as a named argument to the Pipeline
constructor.
Here's what it looks like to run a Pipeline
on the Canadian equity domain:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.domain import CA_EQUITIES
from quantopian.research import run_pipeline
pipe_ca = Pipeline(columns={}, domain=CA_EQUITIES)
df_ca = run_pipeline(pipe_ca, '2015-01-01', '2016-01-01')
df_ca.head()
The large integer in each Equity
's string representation is the security identifier (SID) for that equity. New international equities have long sids to avoid collisions with previously-existing US sids.
For comparison, here's what an empty Pipeline looks like with the US_EQUITIES
domain:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.domain import US_EQUITIES
from quantopian.research import run_pipeline
pipe_us = Pipeline(columns={}, domain=US_EQUITIES)
df_us = run_pipeline(pipe_us, '2015-01-01', '2016-01-01')
df_us.head()
The difference in equities is obvious. The calendar difference is a little more subtle - especially between Canada and the United States, which have very similar holiday schedules. To see the difference in trading calendars, let's focus on one asset in each market in the month of October.
df_ca.loc[(slice('2015-10-08', '2015-10-14'), 1178883868150594), :]
df_us.loc[(slice('2015-10-08', '2015-10-14'), 24), :]
October 12th, 2015 (Monday) appears in the output of the US_EQUITIES
pipeline but not in the output of the CA_EQUITIES
pipeline. This happens because Canadian markets have a holiday (Canadian Thanksgiving) on the second Monday of October.
So far we've only seen examples of running pipelines with empty outputs. Of course, most Pipeline API users don't just want empty outputs: they want to compute things!
If we add output columns to a pipeline with a domain, then the values of the outputs will be computed using data from that domain:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.factors import Returns
from quantopian.pipeline.domain import CA_EQUITIES
from quantopian.research import run_pipeline
pipe_ca_with_data = Pipeline(
{
# 5-day returns over the most recent five Toronto Stock Exchange trading days.
'returns_5d': Returns(window_length=5),
'volume': EquityPricing.volume.latest,
},
domain=CA_EQUITIES,
screen=EquityPricing.volume.latest > 0,
)
df_ca_with_data = run_pipeline(pipe_ca_with_data, '2015-01-01', '2016-01-01')
df_ca_with_data.head()
Pipeline
executes.quantopian.pipeline.domain
.Pipeline
objects via the new optional domain
parameter.Sharp-eyed users familiar with the Pipeline API may have noticed another new API feature in the previous example: the EquityPricing
dataset. Before today, users that wanted pricing data in their pipelines used USEquityPricing
, which has open
, high
, low
, close
, and volume
columns containing daily price/volume summaries. These columns extend naturally to any market for which we have daily pricing data, but the dataset had a US-specific name because we only supported US data when it was created.
We could have created separate datasets for every new market (e.g., CAEquityPricing
, JPEquityPricing
, etc.), but that would have required creating separate copies of every price-based Factor
(like Returns
in the example above), even though the business logic for each copy would have been the same. Having separate datasets for each market also would have made it hard to convert a Pipeline from one domain to another, a use-case we wanted to support well. The pattern of having datasets that naturally extend to multiple markets isn't unique to pricing data. The new FactSet Fundamentals dataset, for example, also generalizes naturally over many countries.
On the other hand, some pipeline expressions really only make sense to use on a particular market. Some data vendors only cover specific countries, for example, and some expressions like the QTradableStocksUS
have logic that's specific to a particular market. In these cases, it would be confusing at best to support all possible domains.
To solve these problems, we've updated the Pipeline API to distinguish between two kinds of pipeline datasets: generic and specialized.
As of today, there are two generic datasets in the Pipeline API:
quantopian.pipeline.data.EquityPricing
quantopian.pipeline.data.factset.Fundamentals
Currently, all other datasets, including USEquityPricing
, Morningstar Fundamentals, premium datasets, and self-serve datasets are specialized to the US_EQUITIES
domain.
Generic datasets make it easy to define a Pipeline that's easily re-usable across multiple domains.
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import factset, EquityPricing
from quantopian.pipeline.domain import JP_EQUITIES, HK_EQUITIES
from quantopian.research import run_pipeline
def make_pipeline(domain):
columns = {
'mcap': factset.Fundamentals.mkt_val.latest,
'close': EquityPricing.close.latest,
}
return Pipeline(columns, domain=domain)
df_jp = run_pipeline(make_pipeline(JP_EQUITIES), '2015-01-15', '2016-01-15')
df_hk = run_pipeline(make_pipeline(HK_EQUITIES), '2015-01-15', '2016-01-15')
df_jp.head()
df_hk.head()
In the previous two examples, we defined pipelines with a generic dataset, and we explicitly passed a domain to the pipeline constructor. If we define a pipeline without providing a domain
argument, the pipeline execution machinery will attempt to infer a domain from the contents of the pipeline. When a pipeline is defined using only generic datasets, the pipeline will default to the US_EQUITIES
domain.
For example, the same pipeline we just ran will default to US_EQUITIES
if we don't explicitly provide a domain.
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import factset, EquityPricing
from quantopian.research import run_pipeline
pipe = Pipeline(
columns={
'mcap': factset.Fundamentals.mkt_val.latest,
'close': EquityPricing.close.latest,
},
)
df = run_pipeline(pipe, '2015-01-01', '2016-01-01')
df.head()
In some cases, we may want to define a pipeline that uses a specialized dataset, like Psychsignal. Psychsignal is specialized to the US_EQUITIES
domain because it only provides a signal for US equities (at least, it only provides it for US equities in the Quantopian integration).
If we define a pipeline with a specialized dataset, the pipeline object will infer its domain based on the domain of that specialized dataset. In practice, that means the pipeline will default to the US_EQUITIES
domain since the Pipeline API currently only has generic datsets and datasets specialized to the US_EQUITIES
domain.
The example below runs a pipeline with a Psychsignal dataset. Since the pipeline infers the domain from the Psychsignal dataset, we don't need to provide a domain
argument.
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.psychsignal import stocktwits
from quantopian.research import run_pipeline
pipe = Pipeline(
columns={
'bull_scored_messages': stocktwits.bull_scored_messages.latest
},
)
df = run_pipeline(pipe, '2015-01-01', '2016-01-01')
df.head()
We can add a generic dataset and our pipeline will still default to the domain of the specialized dataset.
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import factset
from quantopian.pipeline.data.psychsignal import stocktwits
from quantopian.research import run_pipeline
pipe = Pipeline(
columns={
'mcap': factset.Fundamentals.mkt_val.latest,
'bull_scored_messages': stocktwits.bull_scored_messages.latest
},
)
df = run_pipeline(pipe, '2015-01-01', '2016-01-01')
df.head()
However, if we try to explicitly define the domain of the pipeline to CA_EQUITIES
, we get an error.
# NOTE: This cell is expected to raise an error!
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import factset
from quantopian.pipeline.data.psychsignal import stocktwits
from quantopian.pipeline.domain import CA_EQUITIES
from quantopian.research import run_pipeline
pipe = Pipeline(
columns={
'mcap': factset.Fundamentals.mkt_val.latest,
'bull_scored_messages': stocktwits.bull_scored_messages.latest
},
domain=CA_EQUITIES
)
df = run_pipeline(pipe, '2015-01-01', '2016-01-01')
df.head()
This error message indicates that we tried to provide a domain to our pipeline that is different from the inferred domain.
Note: In backtesting, the domain of a pipeline is always inferred from the trading calendar. Currently, only the US_EQUITIES
domain is supported in the Pipeline API in backtesting.
In this notebook, we:
Currently, this notebook is the best reference material on domains. We are still working on writing official documentation. We will make an announcement in the forums when more reference material becomes available.