Notebook

New in Pipeline: Column Slices and New Factor Methods

  • You can now extract single columns of certain Factors by indexing into them by asset. These "slices", or columns of data, can then be used as inputs to custom factors.
  • The methods pearsonr, spearmanr and linear_regression have been added to Factors. These methods allow you to compute correlations or run regressions between the columns of two different terms, allowing for more generic operations than is currently supported with the RollingPearsonOfReturns, RollingSpearmanOfReturns and RollingLinearRegressionOfReturns built-ins. For more information on these methods check out the docs: https://www.quantopian.com/help#quantopian_pipeline_factors_Factor_pearsonr.
In [1]:
from pandas import DataFrame, date_range
from quantopian.pipeline import CustomFactor, Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume, Returns, SimpleMovingAverage
from quantopian.research import run_pipeline
In [2]:
# Simple slicing example -- first create a Returns factor, then extract the single column corresponding
# to AAPL. This creates a "Slice" object (called `returns_aapl` here) which is used as an input to a
# custom factor.
returns = Returns(window_length=30)
returns_aapl = returns[symbols(24)]

class UsesReturns(CustomFactor):
    window_length = 5
    inputs = [returns, returns_aapl]
    
    def compute(self, today, assets, out, returns, returns_aapl):
        # Print the shape of each input. Our AAPL slice `returns_aapl` should have a shape of (5, 1)
        # because our window length is 5, and by definition we only have 1 column.
        print 'Returns {0}:\n{1}'.format(returns.shape, returns)
        print '\n'
        print 'AAPL Returns Slice {0}:\n{1}'.format(returns_aapl.shape, returns_aapl)

pipe = Pipeline(columns={'uses_returns' : UsesReturns()})
run_pipeline(pipe, '2016-06-01', '2016-06-01');
Returns (5, 8357):
[[-0.03448435         nan -0.12088488 ...,         nan         nan
          nan]
 [-0.03963149 -0.26059322 -0.10609805 ...,         nan         nan
          nan]
 [-0.04623547 -0.27272727 -0.08027855 ...,         nan         nan
          nan]
 [-0.07064616 -0.21348315 -0.06004735 ...,         nan         nan
          nan]
 [-0.11625973 -0.16553288 -0.06051999 ...,         nan         nan
          nan]]


AAPL Returns Slice (5, 1):
[[-0.12088488]
 [-0.10609805]
 [-0.08027855]
 [-0.06004735]
 [-0.06051999]]
In [3]:
# Slices are unaffected by masking -- when passing a mask to a custom factor, slice inputs are unaffected,
# even if the asset with which the slice is associated would be filtered out.
adv = AverageDollarVolume(window_length=30)
no_aapl = adv.percentile_between(90, 95)

pipe = Pipeline(columns={'uses_returns' : UsesReturns(mask=no_aapl)})
run_pipeline(pipe, '2016-06-01', '2016-06-01');
Returns (5, 418):
[[ 0.01299601  0.06749193 -0.0264214  ...,  0.10536366 -0.04043546
  -0.15353487]
 [ 0.01429063  0.03822854 -0.01255697 ...,  0.14916499 -0.06980273
  -0.15971564]
 [-0.0053064   0.02570455  0.00091307 ...,  0.13733548 -0.04381245
  -0.16817532]
 [ 0.0035515   0.03640877  0.00246258 ...,  0.14000736 -0.05374716
  -0.17451719]
 [-0.01437511  0.01483528  0.01176393 ...,  0.11232306 -0.05505279
  -0.18579744]]


AAPL Returns Slice (5, 1):
[[-0.12088488]
 [-0.10609805]
 [-0.08027855]
 [-0.06004735]
 [-0.06051999]]

Note:

This is not true when masking the actual Factor from which the Slice is taken. That is:

returns = Returns(window_length=30, mask=no_aapl)
returns_aapl = returns[symbols(24)]

will result in returns_aapl being all NaNs.

Returns can be an input?

One might have noticed that Returns is being used as an input to a custom factor. Previously this was not allowed for any factors, but is now allowed for a select few factors deemed safe for use as inputs. This includes Returns and any factors created from rank or zscore. The main reason that these factors can be used as inputs is that they are comparable across splits. Returns, rank and zscore produce normalized values, meaning that they can be meaningfully compared in any context.

Something like SimpleMovingAverage could not be used as an input because of the possibility that a lookback window contains a split. If SimpleMovingAverage were to be used as an input, its computed output would not be adjusted, unlike BoundColumn terms such as USEquityPricing.close which are adjusted for splits. Inputs need to be comparable across splits because if for example you wanted to compute correlations using AAPL's simple moving average in June of 2014, all calculations overlapping with the split would be distorted and meaningless. Because of this, most factors still cannot be used as inputs, including any custom factors.

In [4]:
# Here is AAPL's 30-day SMA in June, 2014. One can imagine that attempting to compute the correlation
# between this timeseries and another would produce nonsensical results.
sma = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=30)
pipe = Pipeline(columns={'sma' : sma})
results = run_pipeline(pipe, '2014-05-20', '2014-06-25')
results.sma.unstack()[symbols(24)]
Out[4]:
2014-05-20 00:00:00+00:00    561.877642
2014-05-21 00:00:00+00:00    564.676639
2014-05-22 00:00:00+00:00    567.535248
2014-05-23 00:00:00+00:00    570.200929
2014-05-27 00:00:00+00:00    573.319847
2014-05-28 00:00:00+00:00    576.953378
2014-05-29 00:00:00+00:00    580.460198
2014-05-30 00:00:00+00:00    584.469081
2014-06-02 00:00:00+00:00    588.365998
2014-06-03 00:00:00+00:00    591.913529
2014-06-04 00:00:00+00:00    595.557869
2014-06-05 00:00:00+00:00    599.427306
2014-06-06 00:00:00+00:00    603.611458
2014-06-09 00:00:00+00:00     86.617434
2014-06-10 00:00:00+00:00     87.032174
2014-06-11 00:00:00+00:00     87.360587
2014-06-12 00:00:00+00:00     87.684058
2014-06-13 00:00:00+00:00     87.965227
2014-06-16 00:00:00+00:00     88.206811
2014-06-17 00:00:00+00:00     88.473615
2014-06-18 00:00:00+00:00     88.696881
2014-06-19 00:00:00+00:00     88.954546
2014-06-20 00:00:00+00:00     89.210824
2014-06-23 00:00:00+00:00     89.448863
2014-06-24 00:00:00+00:00     89.687855
2014-06-25 00:00:00+00:00     89.874417
Name: Equity(24 [AAPL]), dtype: float64
In [5]:
# Attempting to add a SimpleMovingAverage factor as an input will fail with a `NonWindowSafeInput` error.
class UsesInvalidInput(CustomFactor):
    window_length = 5
    inputs = [sma]
    
    def compute(self, today, assets, out, sma):
        pass

# This will fail.
UsesInvalidInput()
---------------------------------------------------------------------------
NonWindowSafeInput                        Traceback (most recent call last)
<ipython-input-5-165f0caaf3f4> in <module>()
      8 
      9 # This will fail.
---> 10 UsesInvalidInput()

/build/src/qexec_repo/zipline_repo/zipline/pipeline/mixins.pyc in __new__(cls, inputs, outputs, window_length, mask, dtype, missing_value, ndim, **kwargs)
    117             missing_value=missing_value,
    118             ndim=ndim,
--> 119             **kwargs
    120         )
    121 

/build/src/qexec_repo/zipline_repo/zipline/pipeline/term.pyc in __new__(cls, inputs, outputs, window_length, mask, *args, **kwargs)
    397             mask=mask,
    398             window_length=window_length,
--> 399             *args, **kwargs
    400         )
    401 

/build/src/qexec_repo/zipline_repo/zipline/pipeline/term.pyc in __new__(cls, domain, dtype, missing_value, window_safe, ndim, *args, **kwargs)
    123                     ndim=ndim,
    124                     params=params,
--> 125                     *args, **kwargs
    126                 )
    127             return new_instance

/build/src/qexec_repo/zipline_repo/zipline/pipeline/term.pyc in _init(self, inputs, outputs, window_length, mask, *args, **kwargs)
    405         self.window_length = window_length
    406         self.mask = mask
--> 407         return super(ComputableTerm, self)._init(*args, **kwargs)
    408 
    409     @classmethod

/build/src/qexec_repo/zipline_repo/zipline/pipeline/term.pyc in _init(self, domain, dtype, missing_value, window_safe, ndim, params)
    261         # should set this flag to True.
    262         self._subclass_called_super_validate = False
--> 263         self._validate()
    264         assert self._subclass_called_super_validate, (
    265             "Term._validate() was not called.\n"

/build/src/qexec_repo/zipline_repo/zipline/pipeline/mixins.pyc in _validate(self)
     19     """
     20     def _validate(self):
---> 21         super(PositiveWindowLengthMixin, self)._validate()
     22         if not self.windowed:
     23             raise WindowLengthNotPositive(window_length=self.window_length)

/build/src/qexec_repo/zipline_repo/zipline/pipeline/mixins.pyc in _validate(self)
     64 
     65     def _validate(self):
---> 66         super(RestrictedDTypeMixin, self)._validate()
     67         assert self.ALLOWED_DTYPES is not NotSpecified, (
     68             "ALLOWED_DTYPES not supplied on subclass "

/build/src/qexec_repo/zipline_repo/zipline/pipeline/term.pyc in _validate(self)
    463             for child in self.inputs:
    464                 if not child.window_safe:
--> 465                     raise NonWindowSafeInput(parent=self, child=child)
    466 
    467     def _compute(self, inputs, dates, assets, mask):

NonWindowSafeInput: Can't compute windowed expression UsesInvalidInput((SimpleMovingAverage((USEquityPricing.close::float64,), window_length=30),), window_length=5) with windowed input SimpleMovingAverage((USEquityPricing.close::float64,), window_length=30).

Note:

Slices inherit this "window safe" property, so only Slices of Returns, rank and zscore can currently be used as inputs. This means that:

sma_slice = sma[symbols(24)]
UsesInvalidInput(inputs=[sma_slice])

will also fail.

In [6]:
# Furthermore, Slices cannot be added to a pipeline. Attempts will fail with an
# `UnsupportedPipelineOutput` error.
sma_slice = sma[symbols(24)]
pipe = Pipeline(columns={'aapl_sma' : sma_slice})  # This will fail.
---------------------------------------------------------------------------
UnsupportedPipelineOutput                 Traceback (most recent call last)
<ipython-input-6-f344beff6f73> in <module>()
      2 # `UnsupportedPipelineOutput` error.
      3 sma_slice = sma[symbols(24)]
----> 4 pipe = Pipeline(columns={'aapl_sma' : sma_slice})  # This will fail.

/build/src/qexec_repo/zipline_repo/zipline/pipeline/pipeline.py in __init__(self, columns, screen)
     36         screen=optional(Filter),
     37     )
---> 38     def __init__(self, columns=None, screen=None):
     39         if columns is None:
     40             columns = {}

/build/src/qexec_repo/zipline_repo/zipline/pipeline/pipeline.pyc in __init__(self, columns, screen)
     42         validate_column = self.validate_column
     43         for column_name, term in columns.items():
---> 44             validate_column(column_name, term)
     45             if not isinstance(term, ComputableTerm):
     46                 raise TypeError(

/build/src/qexec_repo/zipline_repo/zipline/pipeline/pipeline.pyc in validate_column(column_name, term)
    190     def validate_column(column_name, term):
    191         if term.ndim == 1:
--> 192             raise UnsupportedPipelineOutput(column_name=column_name, term=term)

UnsupportedPipelineOutput: Cannot add column 'aapl_sma' with term Slice(SimpleMovingAverage, column=Equity(24 [AAPL])). Adding slices or single-column-output terms as pipeline columns is not currently supported.

Why can't Slices be added as pipeline columns?

The output of Slices would not fit the multi-index format of a normal pipeline output. Slices output a single value per day corresponding to the asset with which that Slice is associated, but the current infrastructure requires a value for every asset on every day. While a potential solution would be to fill in all other assets with missing values (e.g. NaN), this would detract from the benefits and ease-of-use of Slices.

Macroeconomic datasets suffer a similar dilemma in that they output a single value per day which is unassociated with any particular assets (see here for details: https://www.quantopian.com/posts/upcoming-changes-to-quandl-datasets-in-pipeline-vix-vxv-etc-dot). Ideally we would like to add support for single-value-output terms, such as Slices and VIX, but as of now there is unfortunately no good way to output these terms as pipeline columns.

In [7]:
# This is how a Slice of Returns of an arbitrary asset might look if they could be added as a pipeline
# column. Instead of a multi-index DataFrame with dates and assets, we only need an index of dates
# with a single value per day.
DataFrame(['some value'] * 10, columns=['Returns'], index=date_range('2016-06-01','2016-06-10'))
Out[7]:
Returns
2016-06-01 some value
2016-06-02 some value
2016-06-03 some value
2016-06-04 some value
2016-06-05 some value
2016-06-06 some value
2016-06-07 some value
2016-06-08 some value
2016-06-09 some value
2016-06-10 some value

Correlations and regressions.

At the moment, there are quite a few restrictions with how Slices can be used. However, one highly valuable benefit of Slices is their use in computing correlations and regressions between factors. If a Slice is window safe (input safe), we can easily compute the correlation between it and the columns of another factor. For this purpose we have introduced three new Factor methods: pearsonr, spearmanr and linear_regression. The pearsonr method takes a target, a correlation_length and an optional mask. The target parameter can be another factor (i.e. an ordinary 2D factor), a Slice of another factor, or a BoundColumn term. In any case, both the factor calling pearsonr and the target parameter must be "input safe" (which again for factors is currently limited to Returns, zscore and rank). However, target may also be a 1D dataset (including VIX and other macroeconomic indicators), or an ordinary 2D dataset such as sentiment values.

The spearmanr method takes the same arguments as pearsonr, and the linear_regression method also takes the same arguments except it uses regression_length instead of correlation_length. These new methods are designed to be more flexible than the aforementioned built-ins (RollingPearsonOfReturns, RollingSpearmanOfReturns and RollingLinearRegressionOfReturns) which were released a few months ago.

In [8]:
# `Factor.pearsonr` takes a target term, in this case `returns_aapl`, and uses it to compute rolling
# pearson correlation coefficients with the columns of another factor, in this case `returns`. This
# example computes the correlation between each stock and SPY (the market) over 100-day look backs of
# Returns. It is recommended that a mask be used when using this method as computations over every asset
# is expensive.

# NOTE: This is equivalent to doing:
# returns_corr = RollingPearsonOfReturns(
#     target=symbols(8554),
#     returns_length=30,
#     correlation_length=100,
#     mask=adv.top(500),
# )
returns = Returns(window_length=30)
returns_spy = returns[symbols(8554)]  # Creates `Slice` object of SPY's returns.
returns_corr = returns.pearsonr(
    target=returns_spy,
    correlation_length=100,
    mask=adv.top(500),
)

pipe = Pipeline(columns={'returns_corr' : returns_corr})
results = run_pipeline(pipe, '2016-06-01', '2016-06-15')
results.returns_corr.unstack().dropna(axis=1)
Out[8]:
Equity(2 [AA]) Equity(24 [AAPL]) Equity(62 [ABT]) Equity(64 [ABX]) Equity(67 [ADSK]) Equity(114 [ADBE]) Equity(128 [ADM]) Equity(154 [AEM]) Equity(161 [AEP]) Equity(168 [AET]) ... Equity(47415 [SYF]) Equity(47740 [BABA]) Equity(47777 [CFG]) Equity(48220 [LC]) Equity(49073 [LABU]) Equity(49139 [FIT]) Equity(49141 [CPGX]) Equity(49229 [KHC]) Equity(49242 [PYPL]) Equity(49506 [HPE])
2016-06-01 00:00:00+00:00 0.870806 0.844277 0.844332 -0.076240 0.952584 0.791136 0.824777 0.040521 -0.048851 0.733583 ... 0.837766 0.943328 0.830941 0.725905 0.795765 0.676880 0.885049 0.462835 0.750269 0.849396
2016-06-02 00:00:00+00:00 0.869259 0.841273 0.843325 -0.081958 0.953874 0.791726 0.818449 0.036609 -0.047402 0.744172 ... 0.836353 0.944001 0.834047 0.726947 0.793942 0.679140 0.884397 0.448200 0.746027 0.853281
2016-06-03 00:00:00+00:00 0.866285 0.837374 0.840904 -0.093556 0.955670 0.792524 0.809119 0.028566 -0.045596 0.753224 ... 0.834526 0.943977 0.836252 0.728457 0.791526 0.681062 0.884590 0.426570 0.739840 0.856201
2016-06-06 00:00:00+00:00 0.864476 0.834125 0.838423 -0.105170 0.957746 0.794590 0.802273 0.018444 -0.042879 0.758889 ... 0.832890 0.943973 0.838828 0.728927 0.788017 0.679271 0.883863 0.407757 0.734428 0.857451
2016-06-07 00:00:00+00:00 0.861314 0.831105 0.836698 -0.121018 0.958207 0.797066 0.795624 0.007061 -0.043894 0.775045 ... 0.830495 0.943679 0.839777 0.730802 0.784785 0.677019 0.881460 0.398048 0.728202 0.856445
2016-06-08 00:00:00+00:00 0.856048 0.827773 0.835566 -0.146332 0.958591 0.800710 0.787171 -0.014554 -0.053825 0.776939 ... 0.824585 0.944734 0.842979 0.733573 0.779564 0.676547 0.880541 0.371411 0.719413 0.854148
2016-06-09 00:00:00+00:00 0.849377 0.824884 0.834310 -0.170394 0.957937 0.803939 0.779868 -0.036186 -0.044288 0.781367 ... 0.820405 0.944528 0.842462 0.727129 0.774302 0.672616 0.877662 0.353216 0.711469 0.852473
2016-06-10 00:00:00+00:00 0.841364 0.820427 0.838652 -0.196208 0.955604 0.810810 0.771827 -0.061391 -0.030619 0.796300 ... 0.816250 0.944565 0.841532 0.718745 0.770730 0.666611 0.874848 0.332147 0.702078 0.850543
2016-06-13 00:00:00+00:00 0.830829 0.813216 0.843084 -0.239855 0.953306 0.817126 0.760199 -0.125610 -0.017908 0.804302 ... 0.810686 0.945122 0.840421 0.710551 0.765084 0.657315 0.873268 0.305902 0.690646 0.847130
2016-06-14 00:00:00+00:00 0.823888 0.804748 0.846398 -0.269436 0.952724 0.828356 0.747921 -0.184543 -0.024911 0.816808 ... 0.805381 0.945748 0.839470 0.708131 0.763027 0.648091 0.875438 0.286932 0.679650 0.840990
2016-06-15 00:00:00+00:00 0.820800 0.797038 0.847253 -0.293385 0.951108 0.834268 0.736191 -0.243830 -0.030873 0.831576 ... 0.786394 0.946850 0.836946 0.705916 0.757158 0.639978 0.876768 0.279769 0.667610 0.836221

11 rows × 463 columns

In [9]:
# Similarly, we can compute linear regressions of the returns of every asset against the returns of a
# single asset. Notice that this method returns a multi-output factor, so we have to access each output
# as its own individual factor.

# NOTE: This is equivalent to doing:
# returns_corr = RollingLinearRegressionOfReturns(
#     target=symbols(8554),
#     returns_length=30,
#     regression_length=100,
#     mask=adv.top(500),
# )
returns_regr = returns.linear_regression(
    target=returns_spy,
    regression_length=100,
    mask=adv.top(500),
)

alpha = returns_regr.alpha
beta = returns_regr.beta
corr = returns_regr.r_value

pipe = Pipeline(columns={'alpha' : alpha, 'beta': beta, 'correlation': corr})
results = run_pipeline(pipe, '2016-06-01', '2016-06-15')

# Print the results of the `beta` factor, dropping any assets with NaNs (assets that were masked out on
# each date are filled with NaNs).
results.beta.unstack().dropna(axis=1)
Out[9]:
Equity(2 [AA]) Equity(24 [AAPL]) Equity(62 [ABT]) Equity(64 [ABX]) Equity(67 [ADSK]) Equity(114 [ADBE]) Equity(128 [ADM]) Equity(154 [AEM]) Equity(161 [AEP]) Equity(168 [AET]) ... Equity(47415 [SYF]) Equity(47740 [BABA]) Equity(47777 [CFG]) Equity(48220 [LC]) Equity(49073 [LABU]) Equity(49139 [FIT]) Equity(49141 [CPGX]) Equity(49229 [KHC]) Equity(49242 [PYPL]) Equity(49506 [HPE])
2016-06-01 00:00:00+00:00 2.822546 1.538415 1.361384 -0.257965 2.751743 1.307091 1.116314 0.061193 -0.033008 0.737835 ... 1.219688 2.203729 1.864672 2.834842 4.725914 2.864485 3.058443 0.315662 1.184111 2.583494
2016-06-02 00:00:00+00:00 2.842539 1.530201 1.371817 -0.280861 2.771097 1.317014 1.115913 0.055768 -0.032294 0.753848 ... 1.223216 2.218820 1.886737 2.873655 4.753663 2.895934 3.074463 0.305772 1.185181 2.614795
2016-06-03 00:00:00+00:00 2.857506 1.525798 1.381554 -0.325591 2.798127 1.330017 1.116469 0.044077 -0.031389 0.772870 ... 1.227815 2.233139 1.910289 2.916244 4.785552 2.932437 3.103371 0.292780 1.183265 2.648150
2016-06-06 00:00:00+00:00 2.875949 1.525388 1.393422 -0.369517 2.827375 1.346815 1.114232 0.028753 -0.029861 0.787975 ... 1.232459 2.247311 1.934616 2.960714 4.788997 2.948607 3.127131 0.281844 1.186060 2.674751
2016-06-07 00:00:00+00:00 2.865339 1.522487 1.405635 -0.427096 2.843714 1.361820 1.110874 0.011096 -0.030846 0.812372 ... 1.231794 2.255094 1.950791 3.002078 4.795821 2.956407 3.134928 0.277487 1.183718 2.684932
2016-06-08 00:00:00+00:00 2.872791 1.530166 1.429861 -0.523764 2.873889 1.389975 1.107660 -0.023183 -0.038494 0.828583 ... 1.225116 2.282619 1.988673 3.081642 4.821057 3.000111 3.176555 0.258772 1.186233 2.704703
2016-06-09 00:00:00+00:00 2.865279 1.533878 1.446998 -0.612616 2.888092 1.410378 1.098131 -0.057847 -0.031967 0.846832 ... 1.220728 2.294136 2.002674 3.101453 4.825618 3.006010 3.188533 0.246231 1.186262 2.717439
2016-06-10 00:00:00+00:00 2.857861 1.538998 1.473276 -0.712581 2.893924 1.442311 1.081182 -0.098492 -0.022403 0.876555 ... 1.219006 2.310308 2.017979 3.115343 4.844823 3.003376 3.210578 0.233011 1.189349 2.735162
2016-06-13 00:00:00+00:00 2.853570 1.542034 1.507784 -0.883723 2.907838 1.481313 1.050786 -0.198676 -0.013366 0.906127 ... 1.219615 2.339074 2.042163 3.148425 4.866814 2.996201 3.254875 0.217574 1.189160 2.756006
2016-06-14 00:00:00+00:00 2.875695 1.546081 1.543348 -1.016438 2.944067 1.536356 1.031587 -0.291166 -0.019030 0.942869 ... 1.226388 2.371012 2.072050 3.216531 4.939547 2.996763 3.327721 0.207996 1.190000 2.766042
2016-06-15 00:00:00+00:00 2.918390 1.544181 1.569343 -1.132260 2.967055 1.577446 1.016229 -0.383095 -0.024085 0.981454 ... 1.231747 2.404701 2.091555 3.272621 4.968336 3.000816 3.391956 0.207753 1.179595 2.782932

11 rows × 463 columns

In [14]:
# Here is another example, this time using two different terms. This is computing, on each day, the
# spearman rank correlation between each stock's 30 day returns over the previous 100 days with the
# previous 100 days of VIX. Note that VIX behaves like a Slice in that each lookback window of VIX is
# a single column of data.
from quantopian.pipeline.data.quandl import yahoo_index_vix as vix
returns = Returns(window_length=30)
returns_vix_corr = returns.spearmanr(
    target=vix.close,
    correlation_length=100,
    mask=adv.top(500),
)

pipe = Pipeline(columns={'returns_vix_correlation' : returns_vix_corr})
results = run_pipeline(pipe, '2016-06-01', '2016-06-15')
results.returns_vix_correlation.unstack().dropna(axis=1)
Out[14]:
Equity(2 [AA]) Equity(24 [AAPL]) Equity(62 [ABT]) Equity(64 [ABX]) Equity(67 [ADSK]) Equity(114 [ADBE]) Equity(128 [ADM]) Equity(154 [AEM]) Equity(161 [AEP]) Equity(168 [AET]) ... Equity(47415 [SYF]) Equity(47740 [BABA]) Equity(47777 [CFG]) Equity(48220 [LC]) Equity(49073 [LABU]) Equity(49139 [FIT]) Equity(49141 [CPGX]) Equity(49229 [KHC]) Equity(49242 [PYPL]) Equity(49506 [HPE])
2016-06-01 00:00:00+00:00 -0.457485 -0.496682 -0.625174 0.363617 -0.634511 -0.851051 -0.650929 0.069722 0.413926 -0.369023 ... -0.829935 -0.739127 -0.749838 -0.194554 -0.819326 -0.802344 -0.703897 -0.453981 -0.329713 -0.529205
2016-06-02 00:00:00+00:00 -0.438607 -0.482868 -0.616593 0.380737 -0.621946 -0.853092 -0.640512 0.086824 0.425154 -0.384475 ... -0.823028 -0.727425 -0.744665 -0.177968 -0.818798 -0.798503 -0.691631 -0.440929 -0.305668 -0.533508
2016-06-03 00:00:00+00:00 -0.413442 -0.463313 -0.598485 0.413790 -0.603525 -0.851552 -0.635725 0.110983 0.432038 -0.408095 ... -0.813868 -0.709486 -0.734520 -0.164527 -0.817738 -0.791821 -0.674898 -0.434924 -0.274220 -0.534007
2016-06-06 00:00:00+00:00 -0.385400 -0.444747 -0.572766 0.433460 -0.581473 -0.849764 -0.625662 0.120440 0.428341 -0.416088 ... -0.798794 -0.688525 -0.719891 -0.144707 -0.809595 -0.778067 -0.651603 -0.428707 -0.241330 -0.526939
2016-06-07 00:00:00+00:00 -0.359987 -0.426649 -0.555178 0.462047 -0.559012 -0.847297 -0.616211 0.128097 0.430778 -0.439731 ... -0.785454 -0.670529 -0.704301 -0.131019 -0.803438 -0.764086 -0.628212 -0.421849 -0.209028 -0.516617
2016-06-08 00:00:00+00:00 -0.334202 -0.409451 -0.536084 0.495597 -0.539546 -0.844225 -0.603747 0.149315 0.440025 -0.432152 ... -0.773999 -0.655629 -0.693200 -0.109459 -0.798116 -0.752865 -0.608626 -0.408881 -0.177662 -0.502786
2016-06-09 00:00:00+00:00 -0.308424 -0.399215 -0.524510 0.534093 -0.519715 -0.843135 -0.590704 0.172346 0.435531 -0.439354 ... -0.764412 -0.640108 -0.680384 -0.082672 -0.791745 -0.740229 -0.589186 -0.395477 -0.144065 -0.489124
2016-06-10 00:00:00+00:00 -0.284848 -0.385131 -0.528632 0.574766 -0.503381 -0.845170 -0.576746 0.205476 0.422642 -0.457674 ... -0.755609 -0.628742 -0.672055 -0.060073 -0.786681 -0.730214 -0.573278 -0.379767 -0.116162 -0.472778
2016-06-13 00:00:00+00:00 -0.274982 -0.361345 -0.524264 0.601589 -0.498845 -0.844174 -0.559362 0.227523 0.423722 -0.440488 ... -0.754853 -0.624536 -0.673147 -0.046710 -0.780116 -0.729854 -0.569245 -0.347987 -0.098676 -0.449567
2016-06-14 00:00:00+00:00 -0.273734 -0.338086 -0.509532 0.614851 -0.498989 -0.843093 -0.551105 0.247247 0.435717 -0.428403 ... -0.755045 -0.620953 -0.674924 -0.045245 -0.775297 -0.728702 -0.568681 -0.325832 -0.093149 -0.423464
2016-06-15 00:00:00+00:00 -0.271124 -0.315637 -0.492472 0.626204 -0.498569 -0.834861 -0.533841 0.268364 0.449891 -0.427352 ... -0.753761 -0.608088 -0.676748 -0.041765 -0.763908 -0.728054 -0.566539 -0.309078 -0.078627 -0.397097

11 rows × 466 columns

In [15]:
# Finally, here is an example of passing another factor, instead of a Slice, as the target to the
# `pearsonr` method. In this case, correlations are computed asset-wise. That is, if our base factor
# is `Returns` and our target term is sentiment data, then for each asset we are calculating the
# correlation between that asset's returns over the past `correlation_length` days and that asset's
# sentiment data over the past `correlation_length` days.
from quantopian.pipeline.data.sentdex import sentiment
returns = Returns(window_length=30)
returns_sent_corr = returns.pearsonr(
    target=sentiment.sentiment_signal,
    correlation_length=100,
    mask=adv.top(500),
)

pipe = Pipeline(columns={'returns_sentiment_correlation' : returns_sent_corr})
results = run_pipeline(pipe, '2016-06-01', '2016-06-15')
results.returns_sentiment_correlation.unstack().dropna(axis=1)
Out[15]:
Equity(2 [AA]) Equity(24 [AAPL]) Equity(62 [ABT]) Equity(67 [ADSK]) Equity(114 [ADBE]) Equity(161 [AEP]) Equity(168 [AET]) Equity(185 [AFL]) Equity(216 [HES]) Equity(239 [AIG]) ... Equity(41451 [LNKD]) Equity(41636 [MPC]) Equity(42173 [DLPH]) Equity(42230 [TRIP]) Equity(42270 [KORS]) Equity(42950 [FB]) Equity(43694 [ABBV]) Equity(43721 [SCTY]) Equity(45815 [TWTR]) Equity(46631 [GOOG])
2016-06-01 00:00:00+00:00 0.291337 0.202496 -0.413017 0.701678 0.322049 -0.323925 0.357254 -0.402925 0.240862 0.246435 ... 0.018064 0.159438 -0.437199 0.236253 -0.310297 -0.168364 0.250074 -0.066304 0.029590 -0.137991
2016-06-02 00:00:00+00:00 0.286786 0.187694 -0.397502 0.696865 0.319329 -0.325441 0.366781 -0.431691 0.246501 0.234528 ... 0.013053 0.177712 -0.430550 0.249728 -0.318391 -0.182724 0.258265 -0.042759 0.035128 -0.129784
2016-06-03 00:00:00+00:00 0.272804 0.181973 -0.373058 0.691358 0.315117 -0.325406 0.379099 -0.463999 0.252540 0.226795 ... 0.009027 0.204795 -0.422506 0.260710 -0.326946 -0.179465 0.258852 -0.034806 0.029619 -0.126327
2016-06-06 00:00:00+00:00 0.274069 0.177869 -0.373701 0.685940 0.312473 -0.324475 0.384962 -0.441894 0.258009 0.225618 ... 0.007357 0.236717 -0.416457 0.276004 -0.335678 -0.156491 0.269392 -0.027299 0.017091 -0.120476
2016-06-07 00:00:00+00:00 0.278496 0.172766 -0.358666 0.679906 0.311035 -0.322775 0.409470 -0.423397 0.265927 0.214514 ... 0.005953 0.269623 -0.409199 0.296601 -0.344312 -0.163063 0.277552 -0.023194 0.009691 -0.124824
2016-06-08 00:00:00+00:00 0.282909 0.165738 -0.359044 0.670293 0.307181 -0.319873 0.407277 -0.401927 0.275172 0.208634 ... 0.012409 0.324404 -0.400134 0.302310 -0.354423 -0.188326 0.296770 -0.017851 0.030875 -0.151195
2016-06-09 00:00:00+00:00 0.285822 0.144488 -0.353076 0.660774 0.304995 -0.322822 0.416974 -0.382740 0.283332 0.194620 ... 0.017837 0.378520 -0.391771 0.306970 -0.365360 -0.205656 0.300134 -0.012744 0.009572 -0.165434
2016-06-10 00:00:00+00:00 0.289107 0.144483 -0.348939 0.648830 0.304923 -0.323114 0.426694 -0.366427 0.292457 0.181879 ... 0.026177 0.365693 -0.383167 0.309633 -0.387711 -0.183716 0.296851 -0.008243 0.028692 -0.163014
2016-06-13 00:00:00+00:00 0.293286 0.146566 -0.343251 0.635390 0.301722 -0.323786 0.429474 -0.345059 0.302541 0.174539 ... 0.040020 0.349735 -0.373335 0.320026 -0.410151 -0.163709 0.307435 -0.003935 0.030710 -0.171283
2016-06-14 00:00:00+00:00 0.296681 0.176095 -0.331942 0.619955 0.298768 -0.322673 0.432007 -0.330262 0.313537 0.167390 ... 0.051034 0.333073 -0.362269 0.323096 -0.433398 -0.193045 0.316726 0.000257 0.049541 -0.154553
2016-06-15 00:00:00+00:00 0.298046 0.182400 -0.320017 0.604646 0.295479 -0.321521 0.436935 -0.315809 0.323302 0.154359 ... 0.068212 0.314053 -0.350693 0.326174 -0.454165 -0.208928 0.320497 0.002554 0.028512 -0.161565

11 rows × 233 columns

Future considerations.

In the future we would like to consider supporting a number of additions, including:

  • Adding single-value-output terms as pipeline columns.
  • Allowing slices of custom factors to be used as inputs.
  • Allowing the creation of "custom macros", i.e. allowing custom factors to compute a one-dimensional output.