Notebook

Introduction to the Quantopian Risk Model in Research

In this notebook, we give an introduction to the Quantopian Risk Model and introduce new APIs for interacting with risk model on the Quantopian Research Platform.

What is a Factor Risk Model?

Asset prices change for a wide variety of reasons. Some events that affect the price of a stock are specific to just that stock: when a company introduces an innovative new product or suffers a public scandal, for example, the effect on the overall market is mostly limited to changes in the price for that company's stock. In many cases, however, events that affect the price of a stock also affect the prices of other, similar stocks. Many asset prices would change in similar ways if the price of steel tripled overnight or if the US unemployment rate were cut in half over the course of year.

The observation that many assets' prices are influenced by similar external events is one of the motivating ideas behind the use of Factor Risk Models in finance.

A factor risk model attempts to describe the returns of a large number of assets in terms of the returns of a small number of risk factors. The Fama-French model for example, models the returns of each asset in terms of three factors: a "market" factor representing the returns of the market as a whole, a "size" factor representing the returns of large-cap stocks relative to small-cap stocks, and a "value" factor representing the returns of stocks with high book-to-market ratios relative to stocks with low book-to-market ratios.

Generally speaking, a Factor Risk Model consists of two artifacts:

  1. A set of factor returns, each of which is a timeseries of returns associated with a source of price movement shared by all assets of interest.
  2. A set of factor loadings that describes, for each asset of interest, how that asset's returns are driven by the factor returns from (1).

In the case of the Fama-French model, the factor returns are computed by taking the returns of portfolios of assets designed to capture the effect of each factor (for example, the market return is calculated from the return of a broad-market long-only portfolio, and the size return is calculated from a long-short portfolio that longs small-cap stocks and shorts large-cap stocks), and each asset's factor loadings are calculated by running a multiple linear regression of the asset's returns against the factor returns.

The Quantopian Risk Model

The Quantopian Risk Model defines 16 risk factors: 11 sector factors and 5 style factors.

Sector factors capture the returns associated with the aggregrate performance of each sector of the US economy.

Style factors capture the returns associated with other common drivers of assets returns (e.g. size, value, and volatility).

What Can I Do with The Quantopian Risk Model Today?

Factor Risk Models are used for many purposes in quantitative finance. We have many ideas for tools we could build using the risk model, but for this release, we've focused primarily on one important use-case: Performance Attribution via the AlgorithmResult object in research. We've also built experimental support for working with the risk model outputs, both directly in research notebooks and in the Pipeline API.

A complete list of the new API features is as follows:

  1. Additions to the AlgorithmResult class:
    • New Method: create_perf_attrib_tear_sheet()
    • New Field: attributed_factor_returns
    • New Field: factor_exposures
  2. Additions to the Research API:
    • quantopian.research.experimental.get_factor_returns
    • quantopian.research.experimental.get_factor_exposures
  3. Additions to the Pipeline API:
    • quantopian.pipeline.experimental.risk_loading_pipeline
    • quantopian.pipeline.experimental.BasicMaterials
    • quantopian.pipeline.experimental.ConsumerCyclical
    • quantopian.pipeline.experimental.FinancialServices
    • quantopian.pipeline.experimental.RealEstate
    • quantopian.pipeline.experimental.ConsumerDefensive
    • quantopian.pipeline.experimental.HealthCare
    • quantopian.pipeline.experimental.Utilities
    • quantopian.pipeline.experimental.CommunicationServices
    • quantopian.pipeline.experimental.Energy
    • quantopian.pipeline.experimental.Industrials
    • quantopian.pipeline.experimental.Technology

Performance Attribution via the AlgorithmResult Class

The biggest change we're releasing today is a suite of enhancements to the AlgorithmResult class that make it easy for Quantopian users to use the risk model to break down and visualize the performance of their algorithms.

What is Performance Attribution?

When we're developing an algorithm, it's often helpful to have a sense of what drives the performance of that algorithm. For algorithms with a small number of positions, it can be manageable to simply look at the returns of each individual position. As the number of positions we hold grows, however, it becomes increasingly important to find ways of summarizing the performance of an algorithm in a way that preserves as much information as possible.

Performance attribution in PyFolio happens in three steps:

  1. Convert Positions to Factor Exposures: Each day, we calculate the algorithm's net factor exposures by multiplying the weight of each asset in the portfolio by that asset's factor loadings and then summing the exposure vectors across all assets.

(For the linear-algebraically inclined, this step performs a matrix multiplication between the (factors x assets) factor loadings matrix by the (assets x 1) column-vector of portfolio weights each day. One potentially-useful way of thinking about this is that the risk model factor loadings define a change of basis that transforms "stock exposure space" to "factor exposure space", and this step applies that change of basis.)

  1. Convert Factor Exposures to Attributed Returns: The timeseries of factor exposures produced in step (1) gives us a measure of how "sensitive" the risk model expects our portfolio should be to the returns of each risk factor. The next step is to multiply these factor exposures at each time-step by the associated factor returns value at the same time step, which gives us a measure of the algorithm returns at that time-step that we attribute each factor exposure.
  2. Summarize Attributed Returns: The last step of performance attribution is to aggregate all the attributed factor returns into a single measure of the returns that are "explained" by the risk model. We call this aggregated returns measure your "Common Returns", and we call the remaining portion of your returns "Specific Returns".

Running Performance Attribution: create_perf_attrib_tearsheet

The easiest way to run performance attribution is to use the new create_perf_attrib_tear_sheet() method of BacktestResult (the object returned by the built-in get_backtest() function). create_perf_attrib_tear_sheet loads the necessary risk model data and passes it to PyFolio to calculate and plot your algorithm's common and specific returns, risk exposures, and returns attributed to common risk factors.

Let's load up a backtest from an updated version of the Optimize API announcement post.

In [13]:
bt = get_backtest('5a0317326279aa458c825cad')
100% Time: 0:00:04|###########################################################|
In [14]:
bt.create_perf_attrib_tear_sheet()
Summary Statistics
Annualized Specific Return 0.077859
Annualized Common Return -0.039570
Annualized Total Return 0.035197
Specific Sharpe Ratio 1.155831
Exposures Summary Average Risk Factor Exposure Annualized Return Cumulative Return
basic_materials 0.014675 0.006514 0.025784
consumer_cyclical -0.019576 -0.012414 -0.047796
financial_services -0.010304 -0.006019 -0.023391
real_estate -0.000459 0.000285 0.001119
consumer_defensive -0.000909 -0.000237 -0.000928
health_care -0.000280 0.000829 0.003254
utilities 0.004343 0.000632 0.002479
communication_services -0.003713 0.000583 0.002287
energy -0.007708 -0.003769 -0.014697
industrials -0.009999 -0.002147 -0.008393
technology -0.003756 -0.000079 -0.000308
momentum 0.054961 -0.012318 -0.047433
size 0.125280 -0.004258 -0.016589
value -0.077830 -0.005838 -0.022694
short_term_reversal -0.002814 0.000161 0.000630
volatility -0.305543 -0.001932 -0.007553

The new performance attribution tearsheet has a few components:

Summary Tables

At the top of the tearsheet, a table of summary statistics shows the backtest's annualized total returns, common returns, and specific returns (more on what these metrics mean below), as well as a modified Sharpe Ratio that describes the return/volatility ratio of the backtest's specific returns. For this backtest, our annualized total return was 4%, but the risk model attributes -4% returns to our common factor exposures, which was overcome by an 8% specific return.

The next table provides a summary of our algorithm's factor exposures and attributed factor returns. Most of this algorithm's exposures are pretty small (at least on average), but the risk model thinks we have a moderate bias toward low volatility stocks (expressed by our average volatility exposure of -0.3), as well as a small bias toward large-cap stocks (expressed by our average size exposure of 0.1).

Plots

Common vs. Specific Returns

The first plot shows a cumulative time series of the backtest's returns, along with the same returns decomposed into common and specific components (this is the output of step (3) in the outline above). The main purpose of this plot is show whether the algorithm's returns are primarily driven by common returns or specific returns. In general:

  • If the blue line moves with the red line more than the green line, then the algorithm's returns are primarily driven by common factors.
  • If the blue line moves with the green line more than the red line, then the algorithm's returns are primarily driven by specific returns.

In this case, our total return looks like it's mostly driven by specific risk, which is consistent with the low average factor exposures observed in the summary tables.

Attributed Factor Returns

The second plot in the tearsheet shows the returns attributed by the risk model to each factor over time (this is the output of step (2) in the outline above). This plot is often fairly noisy, but it can be useful for spotting outliers and for seeing if one factor dominates your attributed returns. It's likely that we'll revise this plot in a future release to make it easier to read.

Factor Exposures

The third plot in the tearsheet shows the net factor exposures calculated in step (1) of the outline above. This is usually the best plot for spotting unexpected factor risk concentrations and/or spikes. For this algorithm, this plot confirms our earlier observation that we seem to have a consistent bias toward low-volatility stocks.

Direct Access to Performance Attribution Outputs

The performance attribution tearsheet aims to provide a useful default set of visualizations for analyzing an algorithm's risk exposures. While we expect to grow and improve the tearsheet over time, it would be impractical and counter-productive for us to try to include every possible visualization or aggregation of factor exposures and attributed returns, so we've also added new attributes to BacktestResult that allow users to work directly with data used by create_perf_attrib_tear_sheet.

  • BacktestResult.factor_exposures contains the data shown in the bottom plot of the tear sheet.
  • BacktestResult.attributed_factor_returns contains the data shown in the middle plot of the tear sheet.
In [18]:
bt.factor_exposures.tail()
Out[18]:
basic_materials consumer_cyclical financial_services real_estate consumer_defensive health_care utilities communication_services energy industrials technology momentum size value short_term_reversal volatility
dt
2010-12-27 00:00:00+00:00 -0.012393 -0.028433 -0.039666 -0.000756 -0.018518 -0.002671 0.009695 -0.015325 -0.020601 -0.025612 0.022452 0.062587 0.116255 -0.202439 0.158752 -0.162741
2010-12-28 00:00:00+00:00 -0.012352 -0.029155 -0.040630 -0.000763 -0.018692 -0.002936 0.009601 -0.015519 -0.020796 -0.024696 0.022682 0.062841 0.117254 -0.206023 0.123577 -0.165480
2010-12-29 00:00:00+00:00 -0.012371 -0.028762 -0.040200 -0.000761 -0.018751 -0.002683 0.009693 -0.015380 -0.020942 -0.023732 0.020952 0.046211 0.116248 -0.206252 0.195903 -0.163304
2010-12-30 00:00:00+00:00 -0.012613 -0.029273 -0.041429 -0.000764 -0.019416 -0.002796 0.009352 -0.015327 -0.021080 -0.021537 0.021114 0.075480 0.116150 -0.208661 0.262011 -0.163705
2010-12-31 00:00:00+00:00 -0.012290 -0.031668 -0.041019 -0.000763 -0.020152 -0.001826 0.009396 -0.015410 -0.020673 -0.022549 0.022702 0.040225 0.113686 -0.215119 0.262857 -0.154591
In [16]:
bt.attributed_factor_returns.tail()
Out[16]:
basic_materials consumer_cyclical financial_services real_estate consumer_defensive health_care utilities communication_services energy industrials technology momentum size value short_term_reversal volatility common_returns specific_returns total_returns
dt
2010-12-27 00:00:00+00:00 0.000003 0.000083 -0.000363 -0.000008 0.000069 0.000007 -0.000006 -0.000086 0.000073 -0.000037 0.000045 -0.000035 -0.000106 0.000024 -0.000069 -0.000028 -0.000432 -0.001177 -0.001609
2010-12-28 00:00:00+00:00 -0.000032 0.000070 -0.000000 -0.000002 -0.000026 -0.000000 0.000026 0.000047 -0.000080 -0.000007 -0.000018 -0.000048 0.000028 -0.000137 -0.000009 0.000237 0.000048 -0.000973 -0.000924
2010-12-29 00:00:00+00:00 -0.000045 -0.000077 0.000100 -0.000003 -0.000006 -0.000002 -0.000017 -0.000060 -0.000196 -0.000007 0.000017 -0.000033 0.000043 -0.000096 0.000204 0.000032 -0.000144 0.000666 0.000522
2010-12-30 00:00:00+00:00 0.000003 0.000039 0.000156 -0.000002 0.000007 0.000006 -0.000018 0.000007 -0.000034 0.000031 -0.000017 -0.000035 0.000011 -0.000036 0.000245 -0.000217 0.000145 -0.000289 -0.000144
2010-12-31 00:00:00+00:00 -0.000010 0.000059 -0.000103 0.000002 -0.000000 0.000002 0.000000 -0.000026 -0.000018 -0.000026 -0.000036 -0.000003 0.000350 -0.000220 0.000024 -0.000201 -0.000206 -0.000243 -0.000449

We can use this data to build our own custom visualizations of the factor loadings/returns.

The default visualization of the factor exposures shows them as a timeseries. This is useful for seeing how the exposures changes over time, but doesn't help much for seeing the overall distribution of exposures across the span of the algorithm. We can use seaborn to visualize the distribution of the factor exposures.

In [71]:
import pandas as pd
import seaborn as sns
In [72]:
def visualize_exposures_distribution(exposures):
    ax = sns.boxplot(data=bt.factor_exposures.dropna(), orient='h');
    ax.set_title('Distribution of Daily Factor Exposures')
    ax.set_xlabel('Daily Exposure')
    ax.set_ylabel('Factor');
    return ax
In [74]:
visualize_exposures_distribution(bt.factor_exposures);

This visualization gives us a better sense of the overall spread of our algorithm's exposures, at the cost of losing information about how the exposures evolve over time.

Additions to the Research API

If you want to dig deeper into the data that we use for performance, we've added two new methods to the Research API under quantopian.research.experimental:

  • quantopian.research.experimental.get_factor_returns takes (start_date, end_date) and returns a DataFrame containing factor returns for the period between the date bounds.
  • quantopian.research.experimental.get_factor_exposures takes (start_date, end_date, sids) and returns a MultiIndex-ed DataFrame containing factor exposures for the requested sids between the date bounds. (NOTE: The signature of get_factor_exposures is likely to change to (assets, start_date, end_date) in the near future to match the signature of other API methods.
In [75]:
from quantopian.research.experimental import get_factor_loadings, get_factor_returns

Fetch and Plot Factor Returns for 2014

In [77]:
rets = get_factor_returns(pd.Timestamp('2014'), pd.Timestamp('2015'))
rets.head()
Out[77]:
basic_materials consumer_cyclical financial_services real_estate consumer_defensive health_care utilities communication_services energy industrials technology momentum size value short_term_reversal volatility
2014-01-02 00:00:00+00:00 -0.007256 -0.004937 -0.004577 -0.001585 -0.010828 -0.005411 -0.015534 -0.011100 -0.014019 -0.012732 -0.009513 -0.001635 0.002372 0.000246 0.001379 0.004604
2014-01-03 00:00:00+00:00 -0.002073 -0.002857 0.005977 0.006191 -0.002119 0.001814 -0.002942 -0.002381 -0.003211 0.002424 -0.005226 -0.000541 -0.001552 -0.002140 0.000350 -0.000225
2014-01-06 00:00:00+00:00 -0.005684 -0.006031 0.000914 0.003787 -0.004482 -0.003802 0.000805 0.005114 0.000920 -0.005610 -0.001562 0.000875 0.001453 0.000770 -0.000679 -0.000830
2014-01-07 00:00:00+00:00 -0.001649 0.005613 0.000457 0.003144 0.005213 0.010176 0.009381 0.006106 0.007930 0.005058 0.008817 0.000531 -0.000773 -0.000575 0.000325 -0.000164
2014-01-08 00:00:00+00:00 0.005616 -0.001659 0.004108 -0.001880 -0.006836 0.008815 -0.005576 0.000000 -0.006956 -0.000968 0.000282 0.003344 0.001437 -0.000755 -0.001064 0.002605
In [112]:
ax = (((1 + rets).cumprod())
      .plot(title='Cumulative Factor Returns', figsize=(12.5, 10), colormap='Set3'));
ax.grid(False)
ax.legend(bbox_to_anchor=(1, 1), loc='upper left');

Fetch and Plot Factor Loadings for AAPL and MCD

In [132]:
# NOTE: In a future revision we expect to update the API for `get_factor_exposures` so that
#       you can pass strings or Equity objects in addition to sids. For now, however, sids are required.

AAPL_sid = symbols('AAPL').sid  # Should be 24.
MCD_sid = symbols('MCD').sid    # Should be 4707.
loadings = get_factor_loadings(pd.Timestamp('2015'), pd.Timestamp('2016'), [AAPL_sid, MCD_sid])

Most assets have a sector loading of 0 for sectors outside their own.

In [133]:
loadings.head()
Out[133]:
basic_materials consumer_cyclical financial_services real_estate consumer_defensive health_care utilities communication_services energy industrials technology momentum size value short_term_reversal volatility
dates sid
2015-01-02 00:00:00+00:00 24 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.172029 1.573244 3.856067 -0.789493 1.119136 -0.573814
4707 0.0 0.469046 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 -0.308683 2.413958 -0.850918 -0.431775 -1.026782
2015-01-05 00:00:00+00:00 24 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.176622 1.529635 3.848132 -0.786903 1.135419 -0.574704
4707 0.0 0.469515 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 -0.306567 2.408457 -0.851235 -0.219465 -1.028207
2015-01-06 00:00:00+00:00 24 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.166704 1.428461 3.842705 -0.772717 1.279989 -0.555600

AAPL Exposures over Time

In [134]:
nonzero_loadings = loadings.loc[:, (loadings != 0).any()]
(nonzero_loadings
 .xs(24, level=1)   # Extract just the rows for AAPL.
 .plot(grid=False, title='AAPL Risk Exposures')  # Plot them.
 .legend(bbox_to_anchor=(1, 1), loc='upper left'));

This plot shows that AAPL's largest loading coefficient is to size during this period, but it also carries significant positive exposure to the technology sector and to the momentum style factor, as well as smaller negative exposures to value and volatility.

MCD Exposures Over Time

In [136]:
nonzero_loadings = loadings.loc[:, (loadings != 0).any()]
(nonzero_loadings
 .xs(4707, level=1)   # Extract just the rows for MCD.
 .plot(grid=False, title='MCD Risk Exposures')  # Plot them.
 .legend(bbox_to_anchor=(1, 1), loc='upper left'));

Like AAPL, MCD carries a significant size exposure, but its sector exposure to consumer_cyclical is significantly lower, suggesting that MCD's returns are less sensitive to the overall movement of its sector compared to AAPL.

Working With the Risk Model in Pipeline

As part of this release, we've built experimental support for fetching assets' risk loadings as Pipeline API Factors. Each risk factor has a new corresponding Pipeline Factor, which produces the desired risk loadings. These can be used for things like risk-factor-weighting other pipeline expressions.

(NOTE: Pipeline API support for the risk model is currently only available in research. We plan to enable support for risk model pipelines in the backtester in the near future.)

In [171]:
import quantopian.pipeline.experimental as exp
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from operator import or_

loading_terms = {
    'basic_materials': exp.BasicMaterials(),
    'consumer_cyclical': exp.ConsumerCyclical(),
    'financial_services': exp.FinancialServices(),
    'real_estate': exp.RealEstate(),
    'consumer_defensive': exp.ConsumerDefensive(),
    'health_care': exp.HealthCare(),
    'utilities': exp.Utilities(),
    'communication_services': exp.CommunicationServices(),
    'energy': exp.Energy(),
    'industrials': exp.Industrials(),
    'technology': exp.Technology(),
    'momentum': exp.Momentum(),
    'short_term_reversal': exp.ShortTermReversal(),
    'size': exp.Size(),
    'value': exp.Value(),
    'volatility': exp.Volatility(),
}
loading_pipeline = Pipeline(
    loading_terms,
    # This is a fancy way of saying "keep all the rows where at least one factor loading is non-null."
    screen=reduce(or_, (t.notnull() for t in loading_terms.values())),
)
In [172]:
loadings = run_pipeline(loading_pipeline, '2015', '2016')
In [173]:
loadings.head()
Out[173]:
basic_materials communication_services consumer_cyclical consumer_defensive energy financial_services health_care industrials momentum real_estate short_term_reversal size technology utilities value volatility
2015-01-02 00:00:00+00:00 Equity(2 [ARNC]) 1.086220 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 1.218681 0.0 0.561370 1.244769 0.000000 0.0 0.535451 0.016220
Equity(21 [AAME]) 0.000000 0.0 0.0 0.0 0.0 0.143949 0.000000 0.0 -0.417821 0.0 0.821443 -0.269169 0.000000 0.0 -0.536246 -0.057939
Equity(24 [AAPL]) 0.000000 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 1.573244 0.0 1.119136 3.856067 1.172029 0.0 -0.789493 -0.573814
Equity(31 [ABAX]) 0.000000 0.0 0.0 0.0 0.0 0.000000 0.638225 0.0 1.381126 0.0 0.284801 -0.724529 0.000000 0.0 -0.828475 -0.005286
Equity(39 [DDC]) 0.570881 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 -0.174328 0.0 0.049159 -0.250996 0.000000 0.0 0.120844 0.986488

We expect the pattern of "get all the risk loadings together in one Pipeline" to be common enough that we've added a helper function to make it easier to do just that.

quantopian.pipeline.experimental.risk_loading_pipeline creates a pipeline that's equivalent to the one we manually defined in the cells above.

In [174]:
from quantopian.pipeline.experimental import risk_loading_pipeline
In [175]:
easier_loading_pipeline = risk_loading_pipeline()
easier_loadings = run_pipeline(easier_loading_pipeline, '2015',' 2016')
In [176]:
easier_loadings.head()
Out[176]:
basic_materials communication_services consumer_cyclical consumer_defensive energy financial_services health_care industrials momentum real_estate short_term_reversal size technology utilities value volatility
2015-01-02 00:00:00+00:00 Equity(2 [ARNC]) 1.086220 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 1.218681 0.0 0.561370 1.244769 0.000000 0.0 0.535451 0.016220
Equity(21 [AAME]) 0.000000 0.0 0.0 0.0 0.0 0.143949 0.000000 0.0 -0.417821 0.0 0.821443 -0.269169 0.000000 0.0 -0.536246 -0.057939
Equity(24 [AAPL]) 0.000000 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 1.573244 0.0 1.119136 3.856067 1.172029 0.0 -0.789493 -0.573814
Equity(31 [ABAX]) 0.000000 0.0 0.0 0.0 0.0 0.000000 0.638225 0.0 1.381126 0.0 0.284801 -0.724529 0.000000 0.0 -0.828475 -0.005286
Equity(39 [DDC]) 0.570881 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 -0.174328 0.0 0.049159 -0.250996 0.000000 0.0 0.120844 0.986488
In [178]:
(loadings == easier_loadings).all().all()
Out[178]:
True

Review and Conclusions

In this notebook we introduced the Quantopian Risk Model and introduced several new APIs for interacting with the risk model in research.

  • We discussed what a Factor Risk Model is and developed some intuitions for why factor modelling might be a reasonable way to describe the returns of assets.
  • We looked at new features of the AlgorithmResult object that enable Quantopian users to do analyze and visualize the performance of their algorithms.
  • We looked at new Research API methods that allow you to fetch and inspect the outputs of the Quantopian Risk Model.
  • We briefly looked at how you can fetch risk model data in the Pipeline API.