Question about a CustomFactor look-back period

Back to Community

posted May 3, 2018

Hey all, I've cannibalized the following code to suit my desire to test the alpha surrounding the SMI factor, but I am unsure of how exactly to analyze the look-back period of the custom factor or of the individual metrics themselves.

class SMI(CustomFactor):  
    inputs = [USEquityPricing.close, USEquityPricing.high, USEquityPricing.low]  
    def compute(self, today, assets, out, close, high, low):  
        maxi = talib.MAX(high, timeperiod = 8)  
        mini = talib.MIN(low, timeperiod = 8)  
        center = (maxi+mini*.5)  
        c = (close[-1] - center)  
        H1 = talib.EMA(c, timeperiod = 3)  
        H2 = talib.EMA(H1, timeperiod = 3)  
        D = talib.EMA((maxi-mini), timeperiod = 8)  
        D1 = talib.EMA(D, timeperiod = 3)  
        D2 = .5*talib.EMA(D1, timeperiod = 3)  
        SMI = (H2/D2)  
        SMI_signal = talib.EMA(SMI, timperiod = 3)

What I'd like to do is to follow the general guidelines posted here: https://www.youtube.com/watch?v=dDFewKqNDfU
with the initial look-back period to be 8 days, followed by two, 3-day EMA smoothing periods for each D and H, and then an 8-day EMA smoothing period of the SMI as a signal. All that being said, if the data set were to start on say day 1, there shouldn't be any signal until day 19 (7 days until the first H, 2 days until the first H1, 2 days until the first H2, repeat the process for D so no new days there, then lastly 8 days until the first SMI signal).

I'm not confident my current code is doing this however, rather I think each time period acts independently and is pulling data that way, i.e. instead of starting on day 1 or current day -19, I'm just starting on current day -8. That would cause the whole code to be incorrect. And on that note, is there a way I could make this code have a look-back period of 100+ days and just pull the most recent SMI and SMI signal? The ways this algebra works the more data available in the look-back, the higher the resolution, i.e. 100 days of data >>> 19 days of data.

Thanks in advance! Let me know if you need any additional info.

17 responses

Joe Kiefner

May 9, 2018

Anyone? Let me know if more clarification is needed.

Vladimir

May 9, 2018

# Stochastic Momentum Index (SMI) Indicator  
# The Stochastic Momentum Index (SMI) was introduced by William Blau in 1993 

import talib  
# -------------------------------------------------  
stock, period, smooth, sig = symbol('SPY'), 8, 3, 8  
# -------------------------------------------------  
def initialize(context):  
    schedule_function(Indicator, date_rules.every_day(), time_rules.market_open(minutes = 65)) 

def Indicator(context, data):  
    bars = period + smooth + sig

    H = data.history(stock, 'high', bars, '1d')  
    L = data.history(stock, 'low', bars, '1d')  
    C = data.history(stock, 'close', bars, '1d')

    HH = talib.MAX(H, period)  
    LL = talib.MIN(L, period)  
    M = (HH + LL)*0.5  
    D = (C - M)  
    HL = HH - LL  
    Dema_D = talib.DEMA(D, smooth)  
    Dema_HL = talib.DEMA(HL, smooth)  
    SMI = 2*Dema_D/Dema_HL  
    SMI_signal = talib.EMA(SMI, sig)  

    record(SMI = SMI[-1], SMI_signal = SMI_signal[-1])

Joe Kiefner

May 10, 2018

Vladimir, thanks a ton! Very helpful. I didn't have to edit your version of the code much to get it to what I wanted, but I attached below nonetheless!

# Stochastic Momentum Index (SMI) Indicator  
# The Stochastic Momentum Index (SMI) was introduced by William Blau in 1993 

import talib  
# -----------------------------------------  
stock, period, smooth = symbol('UPRO'), 8, 3  
# -----------------------------------------  
def initialize(context):  
    schedule_function(Indicator, date_rules.every_day(), time_rules.market_open(minutes = 65)) 

def Indicator(context, data):  
    bars = 2*period + 2*smooth

    H = data.history(stock, 'high', bars, '1d')  
    L = data.history(stock, 'low', bars, '1d')  
    C = data.history(stock, 'close', bars, '1d')

    hh = talib.MAX(H, period)  
    ll = talib.MIN(L, period)  
    m = (hh + ll)*0.5  
    center = (C - m)  
    H1 = talib.DEMA(center, smooth) # DEMA = double exp  
    D = (hh - ll)  
    D1 = talib.DEMA(D, smooth) # DEMA = double exp  
    SMI = (H1/D1)  
    SMI_signal = talib.EMA(SMI, period)  

    record(SMI = SMI[-1], SMI_signal = SMI_signal[-1])

Joe Kiefner

May 16, 2018

Currently trying to make this into a custom factor, this is what I've got so far but it isn't working...

class SMI(CustomFactor):

    # Pre-declare inputs and window_length  
    inputs = [USEquityPricing.close, USEquityPricing.high, USEquityPricing.low]  
    window_length = 100  
    def compute(self, today, assets, out, close, high, low):  
        table = pd.DataFrame(index=assets)  
        H = high.history(assets, 'high', 50, '1d')  
        L = low.history(assets, 'low', 50, '1d')  
        C = close.history(assets, 'close', 50, '1d')  
        hh = talib.MAX(H, 8)  
        ll = talib.MIN(L, 8)  
        m = (hh + ll)*0.5  
        center = (C - m)  
        H1 = talib.DEMA(center, 3) # DEMA = double exp  
        D = (hh - ll)  
        D1 = talib.DEMA(D, 3) # DEMA = double exp  
        SMI_signal = (H1/D1)  
        SMI = talib.EMA(SMI_signal, 8)  
        SMI_diff = (SMI_signal-SMI)  
        table ["SMI_diff"] = SMI_diff[-1]  
        out[:] = table.fillna(table.max()).mean(axis=1)

Error message is something along the lines of numpy not allowing the use of .history. What'd be the easiest way to work around the data.history deprecation and this error?

Joe Kiefner

May 16, 2018

This is the error code that I get. Updated custom factor is in attached notebook.

AttributeError: 'numpy.ndarray' object has no attribute 'history'

Any and all help is greatly appreciated!

Joe Kiefner

May 18, 2018

Still having trouble getting this set up as a custom factor.

Will attach algorithm and backtest tonight to assist in getting assistiance. Until then and as always all help is greatly appreciated!

Joe Kiefner

May 19, 2018

This is the working independent code as well as the custom factor which is commented out. The custom factor is what I'm having trouble getting running correctly.

If you can possibly help I would greatly appreciate it! Thanks in advance!

Dan Whitnable

May 20, 2018

There are two issues with the custom factor.

First, no reason to fetch any data by using the 'history' method. That data is already passed to the compute function in the parameters high, low, close. The columns are the securities and the rows are dates (the last row is yesterdays data).

# no need for this  
H = high.history(assets, 'high', 22, '1d')  
L = low.history(assets, 'low', 22, '1d')  
C = close.history(assets, 'close', 22, '1d')

# high, low, and close are 2D numpy arrays already with the data  
H = high  
L = low  
C = close

Second, the talib functions generally expect 1D arrays with data for a single security. The high, low, and close (or H, L, C) are 2D arrays with data for multiple securities. Unfortunately, one needs to iterate over each column of these arrays and then pass the column to the talib function. Take a look at these posts for incorporating talib methods into custom factors. https://www.quantopian.com/posts/using-ta-lib-functions-in-pipeline or https://www.quantopian.com/posts/having-difficulty-with-macd-and-custom-factors-in-the-notebook-pipeline.

Joe Kiefner

May 20, 2018

Dan, first off I appreciate your feedback and direction!

I fiddled with the code quite a bit and I think I have the structure somewhat lined out but I must admit, I'm not exactly sure what needs '[:, col_ix]'-ing and what doesn't. I've attached an updated notebook with the associated error:

IndexErrorTraceback (most recent call last)  
<ipython-input-8-5e1955e5efa1> in <module>()  
     10 )  
     11  
---> 12 result = run_pipeline(p, '2014', '2014-03')

/build/src/qexec_repo/qexec/research/api.py in run_pipeline(pipeline, start_date, end_date, chunksize)
    479             start_date,  
    480             end_date,  
--> 481             chunksize  
    482         )  
    483 

/build/src/qexec_repo/qexec/research/_api.pyc in inner_run_pipeline(engine, equity_trading_days, pipeline, start_date, end_date, chunksize)
    862         adjusted_start_date,  
    863         adjusted_end_date,  
--> 864         chunksize=chunksize,  
    865     )  
    866 

/build/src/qexec_repo/zipline_repo/zipline/pipeline/engine.pyc in run_chunked_pipeline(self, pipeline, start_date, end_date, chunksize)
    328             chunksize,  
    329         )  
--> 330         chunks = [self.run_pipeline(pipeline, s, e) for s, e in ranges]  
    331  
    332         if len(chunks) == 1:

/build/src/qexec_repo/zipline_repo/zipline/pipeline/engine.pyc in run_pipeline(self, pipeline, start_date, end_date)
    309             dates,  
    310             assets,  
--> 311             initial_workspace,  
    312         )  
    313 

/build/src/qexec_repo/zipline_repo/zipline/pipeline/engine.pyc in compute_chunk(self, graph, dates, assets, initial_workspace)
    535                     mask_dates,  
    536                     assets,  
--> 537                     mask,  
    538                 )  
    539                 if term.ndim == 2:

/build/src/qexec_repo/zipline_repo/zipline/pipeline/mixins.pyc in _compute(self, windows, dates, assets, mask)
    212                 inputs = format_inputs(windows, inputs_mask)  
    213  
--> 214                 compute(date, masked_assets, out_row, *inputs, **params)  
    215                 out[idx][out_mask] = out_row  
    216         return out

<ipython-input-7-8dcf380e20fd> in compute(self, today, assets, out, high, low, close)  
     47             C = close[:, col_ix]  
     48  
---> 49             hh = talib.MAX(H[:, col_ix], 8)  
     50             ll = talib.MIN(L[:, col_ix], 8)  
     51 

IndexError: too many indices for array

Can you advise as to where I'm still going wrong?

Thanks as always in advance!

Ernesto Perez

May 22, 2018

Hi Joe,

You are correctly indexing high/low/close to get the data you need for each loop. Once you unpack the columns you need into H/L/C (using [:, col_ix] gives you a column from high/low/close), you have 1D arrays.

The error you are getting happens because you are using two indices with a 1D array. For example:

hh = talib.MAX(H[:, col_ix], 8)

Should be:

hh = talib.MAX(H, 8)

The same applies to L and C, and all results you get from talib functions (hh, ll, center, etc).

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Joe Kiefner

May 22, 2018

Ernesto,

Thanks for the advice! I've made the necessary changes, but was having trouble trying to get multiple results to output (SMI, SMI diff, SMI signal) so I tried to output just SMI and there appears to be nothing coming through. I'm sure it has something to do with the out[..] as that's not typically how I output information but 1) the whole col_ix throws a wrench into how I know to output stuff from a custom factor and 2) I typically only output one factor from a custom factor.

Joe Kiefner

Jun 24, 2018

Can anyone assist with this? Can't quite seem to figure it out.

Dan Whitnable

Jun 24, 2018

Increase the window_length from 14 to 20. Another fix is to change the line in the custom factor


     SMI = talib.EMA(SMI_signal, 8)

to be

     SMI = talib.EMA(SMI_signal, 3)

There's only 3 non-nan data points in SMI_signal to average. When it tries to average 8 (3 plus 5 nans) it always returns nan. The fix is to either input more data points (ie increase the window_length) or decrease the number of points being averaged (ie decrease the EMA length). Not sure if this changes your intention of factor but this at least returns an output.

Blue Seahawk

Jun 24, 2018

Also try forward filling the nans.

Forward filling nans in pipeline custom factors
Examples from google search

That would look like this

def compute(self, today, assets, out, high, low, close):  
    high  = nanfill(high)  
    low   = nanfill(low)  
    close = nanfill(close)

Joe Kiefner

Jun 27, 2018

If I change the 8 to a 3 it will not compute correctly, as I'm attempting to get an 8,3,3,8 EMA Stochastic MTM Oscillator as seen on Yahoo Finance Indicators.

Would that just be because I'm only calling out one date, but if I called for a range it would only return nans for the first 8 or so days, then every day after that the dataset would be populated?

Vladimir

Jun 28, 2018

Joe,

Try this 8-3-8

from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline import factors, filters, classifiers  
from quantopian.pipeline.data.builtin import USEquityPricing  
from quantopian.pipeline.factors import CustomFactor  
from quantopian.pipeline import Pipeline  
import pandas as pd  
import numpy as np  
from numpy import isnan, nan  
import talib  
# -----------------------------------------------------------------------------------  
stocks, period, smooth, sig  = filters.StaticAssets(symbols('QQQ', 'TLT')), 8, 3, 8  
# -----------------------------------------------------------------------------------  
bars = period + smooth + sig 

def initialize(context):  
    schedule_function(Indicator, date_rules.every_day(), time_rules.market_close(minutes = 1))  
    m = stocks  
    smi = SMI_factor(mask = m)  
    pipe = Pipeline(columns = {'smi': smi}, screen = m & smi.notnull())  
    attach_pipeline(pipe, 'smi_set')

def Indicator(context, data):  
    stocks = pipeline_output('smi_set').index  
    smi_pipe = pipeline_output('smi_set').smi  
    smi_reg = np.zeros(len(stocks))

    for i, stock in enumerate(stocks):  
        H = data.history(stock, 'high',  bars + 1, '1d')  
        L = data.history(stock, 'low',   bars + 1, '1d')  
        C = data.history(stock, 'close', bars + 1, '1d')

        HH = talib.MAX(H, period)  
        LL = talib.MIN(L, period)  
        M = (HH + LL)*0.5  
        D = (C - M)  
        HL = HH - LL  
        Numer = talib.DEMA(D, smooth)  
        Denom = talib.DEMA(HL, smooth) 

        SMI = 2*Numer/Denom  
        SMI_sig = talib.EMA(SMI, sig)  
        SMI_diff = (SMI - SMI_sig)[-2]  
        smi_reg[i] = SMI_diff

    for i, stock in enumerate(stocks):  
        record(**{stocks[i].symbol + '_pipe': smi_pipe[i]})  
        record(**{stocks[i].symbol + '_reg': smi_reg[i]})

def columnwise_anynan(array2d):  
    return isnan(array2d).any(axis = 0)

def nanfill(arr):  
    mask = np.isnan(arr)  
    idx  = np.where(~mask,np.arange(mask.shape[1]),0)  
    np.maximum.accumulate(idx,axis=1, out=idx)  
    arr[mask] = arr[np.nonzero(mask)[0], idx[mask]]  
    return arr  

class SMI_factor(CustomFactor):  
    inputs = [USEquityPricing.high, USEquityPricing.low, USEquityPricing.close]  
    window_length = bars  
    def compute(self, today, assets, out, high, low, close):  
        anynan = columnwise_anynan(high)  
        for col_ix, have_nans in enumerate(anynan):  
            if have_nans:  
                out[col_ix] = nan  
                continue  
            H = high[:, col_ix]  
            L = low[:, col_ix]  
            C = close[:, col_ix]  
            HH = talib.MAX(H, period)  
            LL = talib.MIN(L, period)  
            M = (HH + LL)*0.5  
            D = (C - M)  
            HL = HH - LL  
            Numer = talib.DEMA(D, smooth)  
            Denom = talib.DEMA(HL, smooth)  
            SMI = 2*(Numer/Denom)  
            SMI_sig = talib.EMA(SMI, sig)  
            SMI_diff = (SMI - SMI_sig)  
            results = SMI_diff[-1]     # SMI[-1], SMI_sig[-1], SMI_diff[-1]  
            out[col_ix] = results

Joe Kiefner

Jun 29, 2018

Vladimir, it works flawlessly, thank you so very much!

There are some discrepancies in the Quantopian numbers and the numbers from Yahoo but that is to be expected with differing look-back values.

Again, thank you so much!

You've successfully submitted a support ticket.

Our support team will be in touch soon.