Using TA-Lib Functions in Pipelines¶

talib's function signatures are a bit awkward to use in Pipeline right now. There are three major issues with them that I'm aware of:

TALib's functions assume that they're receiving all the data at once, and they perform rolling computations in their inputs.
TALib's functions only work on one asset's worth of data at a time.
Many TALib functions don't robustly handle NaNs. Since Pipeline often provides columns with leading NaNs (e.g. for a company that's only existed for 5 days when you've requested a 10 day window.), you have to manually ignore columns containing NaNs.

Normal TA-Lib Usage¶

TA-Lib functions expect to be passed 1D arrays of data on which to perform rolling computations. Its authors expect a usage pattern like this:

prices = get_pricing('MSFT', fields=['high', 'low', 'close_price'], start_date='2014', end_date='2014-03')
prices.head()

Compute a rolling WILLR on the entire price history.¶

The result will be a 1-D array with 13 leading NaNs.

from talib import WILLR

msft_willr = WILLR(prices.high.values, prices.low.values, prices.close_price.values, timeperiod=14)
msft_willr

array([         nan,          nan,          nan,          nan,
                nan,          nan,          nan,          nan,
                nan,          nan,          nan,          nan,
                nan, -52.71119134, -44.9034749 , -25.34246575,
       -52.05479452, -43.83561644, -30.82191781, -23.63013699,
        -1.53374233, -44.94047619, -66.80161943, -87.44939271,
       -73.27935223, -57.08502024, -48.98785425, -32.38866397,
       -23.04347826, -16.95652174, -16.52173913, -24.34782609,
       -20.86956522, -10.86956522, -13.90977444, -24.81203008,
       -30.07518797, -33.08270677, -20.51282051,  -6.9124424 ])

from matplotlib.pyplot import subplots

# Create a figure with a 2 x 1 stack of subplots.
figure, (top, bottom) = subplots(2, 1, sharex=True)

# Write our computed WILLR values into the top plot.
top.plot(prices.index, msft_willr, color='purple')
top.set_ylabel("Williams' %R")
top.set_title('MSFT Momentum')

# Tell pandas to write our DataFrame values into the bottom plot.
prices.plot(ax=bottom).set_ylabel('US Dollars')

<matplotlib.text.Text at 0x7f0b900b3350>

Computing a TA-Lib Function in Pipeline¶

If we want to use this in Pipeline, we have to jump through a few hoops:

We have to call it roughly 8000 times, once for each asset column.
We have to call it on a window of length timeperiod and then extract the last entry. (I haven't looked much into the underlying TA-Lib implementations to see whether this is a significant performance hit. At the very least, we're allocating a larger output buffer than we need.)
We have to manually screen out columns containing NaNs if the TALib function we want to use doesn't support them.

from numpy import nan, isnan

from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.pipeline.data.builtin import USEquityPricing as USEP
from quantopian.research import run_pipeline

def columnwise_anynan(array2d):
    # isnan will be broadcasted over the array to produce a 2D array of bools.
    # array.any(axis=0) gives us a 1D array whose length is equal to the
    # number of columns in the array.
    return isnan(array2d).any(axis=0)


class WILLRFactor(CustomFactor):
    inputs = [USEP.high, USEP.low, USEP.close]
    window_length = 14
    
    def compute(self, today, assets, out, high, low, close):
        """
        Compute WILLR on each column of high, low, and close.
        """
        # Assume that a nan in high implies a nan in low or close.
        # If we had datasets from different sources, we'd probably
        # want to do something like:
        # columnwise_anynan(high) | columnwise_anynan(low) | columnwise_anynan(close).
        anynan = columnwise_anynan(high)
        
        # In general, it's a bad practice to iterate over numpy arrays like this in pure
        # python. Unfortunately, TALib doesn't provide us with an API to vectorize
        # operations over 2D arrays, so we're stuck with doing this.
        # A nice improvement to Zipline would be to provide a module that does this 
        # efficiently in Cython.
        for col_ix, have_nans in enumerate(anynan):
            
            # If we have nans in the input (e.g., because an asset didn't trade for a 
            # full day, or because the asset hasn't existed for 14 days), just forward
            # the NaN.
            if have_nans:
                out[col_ix] = nan
                continue
            
            # Compute our actual WILLR value.
            # The [:, ix] syntax here is telling Numpy to slice along the second dimension.
            # Just doing array[ix] would give us rows instead of columns
            results = WILLR(
                high[:, col_ix], 
                low[:, col_ix], 
                close[:, col_ix], 
                timeperiod=self.window_length
            )
            
            # Results is a length 14 array containing 13 leading NaNs and then the actual value
            # we care about.  Needless to say, this is less efficient than it could be.
            out[col_ix] = results[-1]

willr = WILLRFactor()

p = Pipeline(
    columns={
        'willr': willr, 
        'latest_close': USEP.close.latest, 
        'latest_high': USEP.high.latest, 
        'latest_low': USEP.low.latest
    }, 
    screen=willr.notnan(),
)

result = run_pipeline(p, '2014', '2014-03')

`run_pipeline` gives us a hierarchically-indexed DataFrame¶

result

We can extract just the MSFT values using DataFrame.xs ¶

Note: The values output here will be shifted forward one day from the values produced via the get_pricing method. This is because the values in Pipeline are date-labelled based on the best-known value as of the morning of the date. Thus, on day N, the best known open/high/close values are the values for day N - 1.

Note: The values produced here are still off from what's produced by the alternative method. In most cases, the difference is small, but in some cases it's as much as 20%. I think this is happening because the formula for Williams %R is

(Highest High - Close)/(Highest High - Lowest Low) * -100

In the case that the numerator and the denominator are both small, this becomes very sensitive to small differences in floating-point rounding behavior. (Though even accounting for that, the differences seend below seem greater than I'd expect.)

MSFT = symbols('MSFT')

msft_result = result.xs(MSFT, level=1)
msft_result

	high	low	close_price
2014-01-02 00:00:00+00:00	37.40	37.10	37.145
2014-01-03 00:00:00+00:00	37.22	36.60	36.920
2014-01-06 00:00:00+00:00	36.89	36.11	36.130
2014-01-07 00:00:00+00:00	36.49	36.21	36.403
2014-01-08 00:00:00+00:00	36.14	35.58	35.750

		latest_close	latest_high	latest_low	willr
2014-01-02 00:00:00+00:00	Equity(2 [AA])	10.630000	10.700000	10.610000	-9.722222
Equity(21 [AAME])	4.110000	4.140000	4.110000	-10.344828
Equity(24 [AAPL])	561.160000	561.290000	560.690000	-39.067055
Equity(25 [AA_PR])	78.750000	78.750000	78.750000	-3.945111
Equity(31 [ABAX])	40.020000	40.400000	40.010000	-7.169811
Equity(39 [DDC])	14.350000	14.360000	14.350000	-8.333333
Equity(41 [ARCB])	33.680000	33.830000	33.640000	-46.453901
Equity(52 [ABM])	28.590000	29.040000	28.560000	-20.871185
Equity(53 [ABMD])	26.720000	27.470000	26.700000	-76.000000
Equity(62 [ABT])	38.340000	38.570000	38.270000	-10.087719
Equity(64 [ABX])	17.620000	17.650000	17.610000	-2.343750
Equity(66 [AB])	21.340000	21.580000	21.340000	-86.842105
Equity(67 [ADSK])	50.320000	50.500000	50.265000	-3.585657
Equity(69 [ACAT])	56.970000	57.190000	56.900000	-23.092699
Equity(70 [VBF])	17.750000	17.860000	17.750000	-28.985507
Equity(76 [TAP])	56.150000	56.490000	56.050000	-10.240964
Equity(84 [ACET])	25.020000	25.240000	24.920000	-6.567164
Equity(86 [ACG])	7.130000	7.180000	7.130000	-32.469210
Equity(88 [ACI])	4.440000	4.490000	4.430000	-38.983051
Equity(99 [ACO])	33.970000	34.420000	33.860000	-11.930586
Equity(100 [IEP])	109.410000	112.000000	109.010000	-80.330969
Equity(106 [ACU])	14.900000	14.900000	14.880000	-50.000000
Equity(110 [ACXM])	36.980000	37.160000	36.930000	-59.531773
Equity(112 [ACY])	17.180000	17.190000	17.180000	-80.670836
Equity(114 [ADBE])	59.870000	59.915000	59.830000	-17.134831
Equity(117 [AEY])	2.690000	2.750000	2.680000	-19.354839
Equity(122 [ADI])	50.940000	51.200000	50.880000	-8.150470
Equity(128 [ADM])	43.410000	43.910000	43.365000	-15.846995
Equity(149 [ADX])	13.070000	13.100000	13.070000	-21.686747
Equity(153 [AE])	68.540000	69.960000	68.230000	-13.946869
...	...	...	...	...	...
2014-03-03 00:00:00+00:00	Equity(46250 [ABIL])	9.640000	9.690000	9.640000	-70.588235
Equity(46251 [ABIL_W])	0.400000	0.420000	0.400000	-100.000000
Equity(46253 [CYHH_Z])	0.060000	0.080000	0.060000	-52.631579
Equity(46259 [MHG])	11.100000	11.150000	11.100000	-86.055992
Equity(46262 [NADL])	9.060000	9.190000	9.050000	-20.634921
Equity(46270 [CLDN])	11.380000	12.940000	11.350000	-45.961003
Equity(46271 [DRNA])	37.070000	41.780000	36.650000	-88.875598
Equity(46272 [TRVN])	7.990000	8.080000	7.990000	-46.698113
Equity(46281 [NWHM])	13.670000	14.480000	13.610000	-32.530120
Equity(46282 [SNOW])	13.450000	13.650000	13.380000	-47.104247
Equity(46283 [CARA])	19.000000	21.220000	18.750000	-26.682692
Equity(46284 [MBUU])	18.420000	18.450000	18.230000	-40.845070
Equity(46285 [RARE])	56.000000	57.250000	55.910000	-30.773751
Equity(46286 [NM_PRG])	24.800000	24.890000	24.800000	-36.697248
Equity(46303 [TFLO])	50.086528	50.096527	50.086528	-75.000000
Equity(46304 [AKTX])	6.450000	6.500000	6.440000	-99.462366
Equity(46307 [CBPX])	19.500000	19.910000	19.490000	-13.311688
Equity(46308 [ASPX])	24.700000	30.500000	24.370000	-51.647373
Equity(46309 [BIOC])	9.410000	9.700000	9.410000	-28.712871
Equity(46310 [QURE])	16.710000	18.010000	16.710000	-62.769231
Equity(46311 [GNCA])	14.740000	16.351000	14.670000	-35.211268
Equity(46313 [EBIO])	16.210000	18.446000	16.200000	-38.348624
Equity(46314 [EGLT])	12.870000	13.550000	12.870000	-46.296296
Equity(46315 [RVNC])	26.930000	30.500000	26.920000	-79.337232
Equity(46316 [CMFN])	15.520000	15.990000	15.520000	-96.078431
Equity(46317 [LADR])	16.990000	17.140000	16.980000	-71.666667
Equity(46326 [ARGS])	10.250000	10.960000	10.200000	-23.986486
Equity(46328 [SZMK])	12.410000	12.460000	12.370000	-1.607717
Equity(46345 [GPRK])	7.647000	7.800000	7.647000	-34.347826
Equity(46346 [JPM_PRB])	25.080000	25.150000	25.080000	-28.000000

	latest_close	latest_high	latest_low	willr
2014-01-02 00:00:00+00:00	37.430	37.580000	37.360000	-42.439024
2014-01-03 00:00:00+00:00	37.145	37.400000	37.140000	-35.611511
2014-01-06 00:00:00+00:00	36.920	37.220000	36.900000	-51.094891
2014-01-07 00:00:00+00:00	36.130	36.890000	36.110000	-98.675497
2014-01-08 00:00:00+00:00	36.403	36.490000	36.400000	-80.596026
2014-01-09 00:00:00+00:00	35.750	36.140000	35.680000	-96.391753
2014-01-10 00:00:00+00:00	35.530	35.910000	35.510000	-99.052133
2014-01-13 00:00:00+00:00	36.040	36.150000	36.030000	-74.881517
2014-01-14 00:00:00+00:00	34.980	36.020000	34.940000	-98.507463
2014-01-15 00:00:00+00:00	35.770	35.880000	35.740000	-69.029851
2014-01-16 00:00:00+00:00	36.760	36.790000	36.710000	-32.089552
2014-01-17 00:00:00+00:00	36.880	37.000000	36.850000	-27.611940
2014-01-21 00:00:00+00:00	36.360	36.830000	36.350000	-46.212121
2014-01-22 00:00:00+00:00	36.160	36.820000	36.100000	-53.787879
2014-01-23 00:00:00+00:00	35.940	36.320000	35.930000	-59.349593
2014-01-24 00:00:00+00:00	36.057	36.140000	35.990000	-51.008772
2014-01-27 00:00:00+00:00	36.810	37.550000	36.760000	-28.352490
2014-01-28 00:00:00+00:00	36.030	36.890000	36.030000	-58.237548
2014-01-29 00:00:00+00:00	36.270	36.390000	36.240000	-49.042146
2014-01-30 00:00:00+00:00	36.650	36.880000	36.640000	-34.482759
2014-01-31 00:00:00+00:00	36.860	36.880000	36.770000	-26.436782
2014-02-03 00:00:00+00:00	37.840	37.890000	37.770000	-1.694915
2014-02-04 00:00:00+00:00	36.480	37.990000	36.430000	-67.111111
2014-02-05 00:00:00+00:00	36.340	37.189000	36.340000	-80.097087
2014-02-06 00:00:00+00:00	35.830	36.470000	35.800000	-98.630137
2014-02-07 00:00:00+00:00	36.180	36.250000	36.160000	-82.648402
2014-02-10 00:00:00+00:00	36.580	36.590000	36.530000	-64.383562
2014-02-11 00:00:00+00:00	36.780	36.800000	36.760000	-55.251142
2014-02-12 00:00:00+00:00	37.190	37.260000	37.160000	-36.529680
2014-02-13 00:00:00+00:00	37.460	37.600000	37.450000	-24.200913
2014-02-14 00:00:00+00:00	37.600	37.860000	37.550000	-17.808219
2014-02-18 00:00:00+00:00	37.330	37.498734	37.300223	-17.351598
2014-02-19 00:00:00+00:00	37.430	37.780000	37.420000	-15.579618
2014-02-20 00:00:00+00:00	37.510	37.750000	37.470000	-12.018563
2014-02-21 00:00:00+00:00	37.740	37.870000	37.740000	-5.563818
2014-02-24 00:00:00+00:00	37.980	38.350000	37.950000	-13.136756
2014-02-25 00:00:00+00:00	37.690	37.975000	37.670000	-23.433132
2014-02-26 00:00:00+00:00	37.550	37.850000	37.540000	-28.403797
2014-02-27 00:00:00+00:00	37.470	37.740000	37.450000	-35.783922
2014-02-28 00:00:00+00:00	37.870	37.890000	37.830000	-22.944994
2014-03-03 00:00:00+00:00	38.310	38.460000	38.240000	-7.600047