Notebook

Using TA-Lib Functions in Pipelines

talib's function signatures are a bit awkward to use in Pipeline right now. There are three major issues with them that I'm aware of:

  1. TALib's functions assume that they're receiving all the data at once, and they perform rolling computations in their inputs.
  2. TALib's functions only work on one asset's worth of data at a time.
  3. Many TALib functions don't robustly handle NaNs. Since Pipeline often provides columns with leading NaNs (e.g. for a company that's only existed for 5 days when you've requested a 10 day window.), you have to manually ignore columns containing NaNs.

Normal TA-Lib Usage

TA-Lib functions expect to be passed 1D arrays of data on which to perform rolling computations. Its authors expect a usage pattern like this:

In [1]:
prices = get_pricing('MSFT', fields=['high', 'low', 'close_price'], start_date='2014', end_date='2014-03')
prices.head()
Out[1]:
high low close_price
2014-01-02 00:00:00+00:00 37.40 37.10 37.145
2014-01-03 00:00:00+00:00 37.22 36.60 36.920
2014-01-06 00:00:00+00:00 36.89 36.11 36.130
2014-01-07 00:00:00+00:00 36.49 36.21 36.403
2014-01-08 00:00:00+00:00 36.14 35.58 35.750

Compute a rolling WILLR on the entire price history.

The result will be a 1-D array with 13 leading NaNs.

In [2]:
from talib import WILLR

msft_willr = WILLR(prices.high.values, prices.low.values, prices.close_price.values, timeperiod=14)
msft_willr
Out[2]:
array([         nan,          nan,          nan,          nan,
                nan,          nan,          nan,          nan,
                nan,          nan,          nan,          nan,
                nan, -52.71119134, -44.9034749 , -25.34246575,
       -52.05479452, -43.83561644, -30.82191781, -23.63013699,
        -1.53374233, -44.94047619, -66.80161943, -87.44939271,
       -73.27935223, -57.08502024, -48.98785425, -32.38866397,
       -23.04347826, -16.95652174, -16.52173913, -24.34782609,
       -20.86956522, -10.86956522, -13.90977444, -24.81203008,
       -30.07518797, -33.08270677, -20.51282051,  -6.9124424 ])
In [3]:
from matplotlib.pyplot import subplots

# Create a figure with a 2 x 1 stack of subplots.
figure, (top, bottom) = subplots(2, 1, sharex=True)

# Write our computed WILLR values into the top plot.
top.plot(prices.index, msft_willr, color='purple')
top.set_ylabel("Williams' %R")
top.set_title('MSFT Momentum')

# Tell pandas to write our DataFrame values into the bottom plot.
prices.plot(ax=bottom).set_ylabel('US Dollars')
Out[3]:
<matplotlib.text.Text at 0x7f0b900b3350>

Computing a TA-Lib Function in Pipeline

If we want to use this in Pipeline, we have to jump through a few hoops:

  1. We have to call it roughly 8000 times, once for each asset column.
  2. We have to call it on a window of length timeperiod and then extract the last entry. (I haven't looked much into the underlying TA-Lib implementations to see whether this is a significant performance hit. At the very least, we're allocating a larger output buffer than we need.)
  3. We have to manually screen out columns containing NaNs if the TALib function we want to use doesn't support them.
In [4]:
from numpy import nan, isnan

from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.pipeline.data.builtin import USEquityPricing as USEP
from quantopian.research import run_pipeline

def columnwise_anynan(array2d):
    # isnan will be broadcasted over the array to produce a 2D array of bools.
    # array.any(axis=0) gives us a 1D array whose length is equal to the
    # number of columns in the array.
    return isnan(array2d).any(axis=0)


class WILLRFactor(CustomFactor):
    inputs = [USEP.high, USEP.low, USEP.close]
    window_length = 14
    
    def compute(self, today, assets, out, high, low, close):
        """
        Compute WILLR on each column of high, low, and close.
        """
        # Assume that a nan in high implies a nan in low or close.
        # If we had datasets from different sources, we'd probably
        # want to do something like:
        # columnwise_anynan(high) | columnwise_anynan(low) | columnwise_anynan(close).
        anynan = columnwise_anynan(high)
        
        # In general, it's a bad practice to iterate over numpy arrays like this in pure
        # python. Unfortunately, TALib doesn't provide us with an API to vectorize
        # operations over 2D arrays, so we're stuck with doing this.
        # A nice improvement to Zipline would be to provide a module that does this 
        # efficiently in Cython.
        for col_ix, have_nans in enumerate(anynan):
            
            # If we have nans in the input (e.g., because an asset didn't trade for a 
            # full day, or because the asset hasn't existed for 14 days), just forward
            # the NaN.
            if have_nans:
                out[col_ix] = nan
                continue
            
            # Compute our actual WILLR value.
            # The [:, ix] syntax here is telling Numpy to slice along the second dimension.
            # Just doing array[ix] would give us rows instead of columns
            results = WILLR(
                high[:, col_ix], 
                low[:, col_ix], 
                close[:, col_ix], 
                timeperiod=self.window_length
            )
            
            # Results is a length 14 array containing 13 leading NaNs and then the actual value
            # we care about.  Needless to say, this is less efficient than it could be.
            out[col_ix] = results[-1]
In [5]:
willr = WILLRFactor()

p = Pipeline(
    columns={
        'willr': willr, 
        'latest_close': USEP.close.latest, 
        'latest_high': USEP.high.latest, 
        'latest_low': USEP.low.latest
    }, 
    screen=willr.notnan(),
)

result = run_pipeline(p, '2014', '2014-03')

run_pipeline gives us a hierarchically-indexed DataFrame

In [6]:
result
Out[6]:
latest_close latest_high latest_low willr
2014-01-02 00:00:00+00:00 Equity(2 [AA]) 10.630000 10.700000 10.610000 -9.722222
Equity(21 [AAME]) 4.110000 4.140000 4.110000 -10.344828
Equity(24 [AAPL]) 561.160000 561.290000 560.690000 -39.067055
Equity(25 [AA_PR]) 78.750000 78.750000 78.750000 -3.945111
Equity(31 [ABAX]) 40.020000 40.400000 40.010000 -7.169811
Equity(39 [DDC]) 14.350000 14.360000 14.350000 -8.333333
Equity(41 [ARCB]) 33.680000 33.830000 33.640000 -46.453901
Equity(52 [ABM]) 28.590000 29.040000 28.560000 -20.871185
Equity(53 [ABMD]) 26.720000 27.470000 26.700000 -76.000000
Equity(62 [ABT]) 38.340000 38.570000 38.270000 -10.087719
Equity(64 [ABX]) 17.620000 17.650000 17.610000 -2.343750
Equity(66 [AB]) 21.340000 21.580000 21.340000 -86.842105
Equity(67 [ADSK]) 50.320000 50.500000 50.265000 -3.585657
Equity(69 [ACAT]) 56.970000 57.190000 56.900000 -23.092699
Equity(70 [VBF]) 17.750000 17.860000 17.750000 -28.985507
Equity(76 [TAP]) 56.150000 56.490000 56.050000 -10.240964
Equity(84 [ACET]) 25.020000 25.240000 24.920000 -6.567164
Equity(86 [ACG]) 7.130000 7.180000 7.130000 -32.469210
Equity(88 [ACI]) 4.440000 4.490000 4.430000 -38.983051
Equity(99 [ACO]) 33.970000 34.420000 33.860000 -11.930586
Equity(100 [IEP]) 109.410000 112.000000 109.010000 -80.330969
Equity(106 [ACU]) 14.900000 14.900000 14.880000 -50.000000
Equity(110 [ACXM]) 36.980000 37.160000 36.930000 -59.531773
Equity(112 [ACY]) 17.180000 17.190000 17.180000 -80.670836
Equity(114 [ADBE]) 59.870000 59.915000 59.830000 -17.134831
Equity(117 [AEY]) 2.690000 2.750000 2.680000 -19.354839
Equity(122 [ADI]) 50.940000 51.200000 50.880000 -8.150470
Equity(128 [ADM]) 43.410000 43.910000 43.365000 -15.846995
Equity(149 [ADX]) 13.070000 13.100000 13.070000 -21.686747
Equity(153 [AE]) 68.540000 69.960000 68.230000 -13.946869
... ... ... ... ... ...
2014-03-03 00:00:00+00:00 Equity(46250 [ABIL]) 9.640000 9.690000 9.640000 -70.588235
Equity(46251 [ABIL_W]) 0.400000 0.420000 0.400000 -100.000000
Equity(46253 [CYHH_Z]) 0.060000 0.080000 0.060000 -52.631579
Equity(46259 [MHG]) 11.100000 11.150000 11.100000 -86.055992
Equity(46262 [NADL]) 9.060000 9.190000 9.050000 -20.634921
Equity(46270 [CLDN]) 11.380000 12.940000 11.350000 -45.961003
Equity(46271 [DRNA]) 37.070000 41.780000 36.650000 -88.875598
Equity(46272 [TRVN]) 7.990000 8.080000 7.990000 -46.698113
Equity(46281 [NWHM]) 13.670000 14.480000 13.610000 -32.530120
Equity(46282 [SNOW]) 13.450000 13.650000 13.380000 -47.104247
Equity(46283 [CARA]) 19.000000 21.220000 18.750000 -26.682692
Equity(46284 [MBUU]) 18.420000 18.450000 18.230000 -40.845070
Equity(46285 [RARE]) 56.000000 57.250000 55.910000 -30.773751
Equity(46286 [NM_PRG]) 24.800000 24.890000 24.800000 -36.697248
Equity(46303 [TFLO]) 50.086528 50.096527 50.086528 -75.000000
Equity(46304 [AKTX]) 6.450000 6.500000 6.440000 -99.462366
Equity(46307 [CBPX]) 19.500000 19.910000 19.490000 -13.311688
Equity(46308 [ASPX]) 24.700000 30.500000 24.370000 -51.647373
Equity(46309 [BIOC]) 9.410000 9.700000 9.410000 -28.712871
Equity(46310 [QURE]) 16.710000 18.010000 16.710000 -62.769231
Equity(46311 [GNCA]) 14.740000 16.351000 14.670000 -35.211268
Equity(46313 [EBIO]) 16.210000 18.446000 16.200000 -38.348624
Equity(46314 [EGLT]) 12.870000 13.550000 12.870000 -46.296296
Equity(46315 [RVNC]) 26.930000 30.500000 26.920000 -79.337232
Equity(46316 [CMFN]) 15.520000 15.990000 15.520000 -96.078431
Equity(46317 [LADR]) 16.990000 17.140000 16.980000 -71.666667
Equity(46326 [ARGS]) 10.250000 10.960000 10.200000 -23.986486
Equity(46328 [SZMK]) 12.410000 12.460000 12.370000 -1.607717
Equity(46345 [GPRK]) 7.647000 7.800000 7.647000 -34.347826
Equity(46346 [JPM_PRB]) 25.080000 25.150000 25.080000 -28.000000

290718 rows × 4 columns

We can extract just the MSFT values using DataFrame.xs

Note: The values output here will be shifted forward one day from the values produced via the get_pricing method. This is because the values in Pipeline are date-labelled based on the best-known value as of the morning of the date. Thus, on day N, the best known open/high/close values are the values for day N - 1.

Note: The values produced here are still off from what's produced by the alternative method. In most cases, the difference is small, but in some cases it's as much as 20%. I think this is happening because the formula for Williams %R is

(Highest High - Close)/(Highest High - Lowest Low) * -100

In the case that the numerator and the denominator are both small, this becomes very sensitive to small differences in floating-point rounding behavior. (Though even accounting for that, the differences seend below seem greater than I'd expect.)

In [7]:
MSFT = symbols('MSFT')

msft_result = result.xs(MSFT, level=1)
msft_result
Out[7]:
latest_close latest_high latest_low willr
2014-01-02 00:00:00+00:00 37.430 37.580000 37.360000 -42.439024
2014-01-03 00:00:00+00:00 37.145 37.400000 37.140000 -35.611511
2014-01-06 00:00:00+00:00 36.920 37.220000 36.900000 -51.094891
2014-01-07 00:00:00+00:00 36.130 36.890000 36.110000 -98.675497
2014-01-08 00:00:00+00:00 36.403 36.490000 36.400000 -80.596026
2014-01-09 00:00:00+00:00 35.750 36.140000 35.680000 -96.391753
2014-01-10 00:00:00+00:00 35.530 35.910000 35.510000 -99.052133
2014-01-13 00:00:00+00:00 36.040 36.150000 36.030000 -74.881517
2014-01-14 00:00:00+00:00 34.980 36.020000 34.940000 -98.507463
2014-01-15 00:00:00+00:00 35.770 35.880000 35.740000 -69.029851
2014-01-16 00:00:00+00:00 36.760 36.790000 36.710000 -32.089552
2014-01-17 00:00:00+00:00 36.880 37.000000 36.850000 -27.611940
2014-01-21 00:00:00+00:00 36.360 36.830000 36.350000 -46.212121
2014-01-22 00:00:00+00:00 36.160 36.820000 36.100000 -53.787879
2014-01-23 00:00:00+00:00 35.940 36.320000 35.930000 -59.349593
2014-01-24 00:00:00+00:00 36.057 36.140000 35.990000 -51.008772
2014-01-27 00:00:00+00:00 36.810 37.550000 36.760000 -28.352490
2014-01-28 00:00:00+00:00 36.030 36.890000 36.030000 -58.237548
2014-01-29 00:00:00+00:00 36.270 36.390000 36.240000 -49.042146
2014-01-30 00:00:00+00:00 36.650 36.880000 36.640000 -34.482759
2014-01-31 00:00:00+00:00 36.860 36.880000 36.770000 -26.436782
2014-02-03 00:00:00+00:00 37.840 37.890000 37.770000 -1.694915
2014-02-04 00:00:00+00:00 36.480 37.990000 36.430000 -67.111111
2014-02-05 00:00:00+00:00 36.340 37.189000 36.340000 -80.097087
2014-02-06 00:00:00+00:00 35.830 36.470000 35.800000 -98.630137
2014-02-07 00:00:00+00:00 36.180 36.250000 36.160000 -82.648402
2014-02-10 00:00:00+00:00 36.580 36.590000 36.530000 -64.383562
2014-02-11 00:00:00+00:00 36.780 36.800000 36.760000 -55.251142
2014-02-12 00:00:00+00:00 37.190 37.260000 37.160000 -36.529680
2014-02-13 00:00:00+00:00 37.460 37.600000 37.450000 -24.200913
2014-02-14 00:00:00+00:00 37.600 37.860000 37.550000 -17.808219
2014-02-18 00:00:00+00:00 37.330 37.498734 37.300223 -17.351598
2014-02-19 00:00:00+00:00 37.430 37.780000 37.420000 -15.579618
2014-02-20 00:00:00+00:00 37.510 37.750000 37.470000 -12.018563
2014-02-21 00:00:00+00:00 37.740 37.870000 37.740000 -5.563818
2014-02-24 00:00:00+00:00 37.980 38.350000 37.950000 -13.136756
2014-02-25 00:00:00+00:00 37.690 37.975000 37.670000 -23.433132
2014-02-26 00:00:00+00:00 37.550 37.850000 37.540000 -28.403797
2014-02-27 00:00:00+00:00 37.470 37.740000 37.450000 -35.783922
2014-02-28 00:00:00+00:00 37.870 37.890000 37.830000 -22.944994
2014-03-03 00:00:00+00:00 38.310 38.460000 38.240000 -7.600047
In [ ]: