Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
[debug] What does "input lengths are different" mean?

Hi,
I'm kinda new to everything here, and after reading around I tried my luck with Python and Quantopian to see what I can make of it.

So, I'm trying to use ADX for which I have this line:

            CurrADX=make_ADX(context, data, security)

calling this function:

def make_ADX(context, data, stk):  
    H = data.history(stk,'high', 2*context.ADXPeriod, '1d').dropna()  
    L = data.history(stk,'low', 2*context.ADXPeriod, '1d').dropna()  
    C = data.history(stk,'price', 2*context.ADXPeriod, '1d').dropna()  
    ta_ADX = talib.ADX(H, L, C, context.ADXPeriod)  
#    ta_nDI = talib.MINUS_DI(H, L, C, context.ADXPeriod)  
#    ta_pDI = talib.PLUS_DI(H, L, C, context.ADXPeriod)  
    ADX = ta_ADX[-1]  
#    nDI = ta_nDI[-1]  
#    pDI = ta_pDI[-1]  
    return ADX  
    pass  

However, after the backtesting process was like 25% on, it generated an error:
"input lengths are different" on the line that calls the function. Any idea what should this mean and how to avoid this error?

Many thanks in advance!

5 responses

Also - I noticed that with a different time frame on the back test I don't get this error...

did you ever figure this out

The error "input lengths are different" is raised when the lengths of the three inputs to the talib ADX function are not equal. The function needs the high, low, and price data to all have the same number of days of data.

How does this happen and why does it only give an error at certain times? The culprit is missing data. There can be instances where price data isn't available or valid and the algo needs to account for that. Adding the 'dropna' method attempted to address this but created its own problems. If there is ever any data missing in one of the three fetches then the 'dropna' method will simply delete that row. That data will now have fewer rows than the others and therefore the talib.ADX function will fail.

So, the issue is what to do if there is ever missing data. The talib.ADX function doesn't like being passed NaN values so doing nothing is not an option. Maybe forward fill any missing data? Maybe don't calculate ADX if there are any NaNs?

However, if one simply wants to drop any rows with NaNs then one would need to drop that row for ALL the data if ANY field has a NaN. This is easiest to accomplish if all the data were in a single dataframe (rather than three separate series). This has the side benefit of speeding up the data fetch too. Something like this could work.

# fetch multiple data fields with a single call  
price_data_panel = data.history(context.MY_SECURITY, fields=["high", "low", "price"], bar_count=4, frequency="1d")  
# the result is a pandas panel. maybe turn it into a dataframe which is a bit easier to use  
price_data_df = price_data_panel.to_frame()  
# use dropna to delete rows where any value is NaN  
price_data_df.dropna(inplace=True)  
# now use the dataframe columns in the talib.ADX function  
ta_ADX = talib.ADX(price_data_df.high, price_data_df.low, price_data_df.price, 2)  

One issue this doesn't address is the situation where too many rows are dropped and the talib.ADX function doesn't have enough data. Save that for a later day...

One should maybe consider doing this in a custom factor using pipeline rather than using the data.history method. It will be faster and also easier to combine and weight with other signals.

Hope this helps. Good luck.

Thanks for the explanation Dan. It does help. For now I was just putting a try block around it to skip that part. I will try your solution.

A 'try block' may be the best solution. Simply do nothing, or rely on another signal, if the ADX calculation fails.

As mentioned, dropping too many rows may leave the ADX method without enough data and it will also error out. However, on a more strategic level, simply dropping rows of data can distort the calculation (and intent) of ADX signal. The ADX signal tries to represent the daily up and down price movements. If a row of data is arbitrarily removed then it's really calculating the price movement from two days ago.