Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
TA Lib different values for same data

Hi everyone,

I'm learning my way around the data, and I've got something I'm confused/puzzled by. I've attached a notebook with the latest MSFT pricing from Thursday-Friday. The ADX values I see in both Schwab and TradingView are 31.38, for 14 periods, with 1 minute data. The notebook is close enough (32.52), though that's still off

However, with the same data, running Python locally with ta-lib, I'm getting a vastly different answer. This is the same data loaded from a CSV file into a pandas dataframe

                        open     high     low    close     volume  
2020-04-17 19:51:00  177.550  178.590  177.54  178.480  2416612.0  
2020-04-17 19:52:00  178.480  178.700  178.30  178.340   867279.0  
2020-04-17 19:53:00  178.370  178.420  178.12  178.190   892965.0  
2020-04-17 19:54:00  178.180  178.320  178.16  178.265   817863.0  
2020-04-17 19:55:00  178.255  178.750  178.24  178.385  1767261.0  
2020-04-17 19:56:00  178.380  178.449  177.82  177.840  1405437.0  
2020-04-17 19:57:00  177.840  178.080  177.80  177.990   914709.0  
2020-04-17 19:58:00  177.995  178.030  177.80  177.900   808404.0  
2020-04-17 19:59:00  177.910  178.390  177.89  178.310  1586913.0  
2020-04-17 20:00:00  178.310  178.740  178.07  178.600  2828277.0  

Which returns a value of 36.46. A pretty large difference. I got some help with the above notebook for resampling the data to a 5 minute interval, which returns a value of 21.42 (very close to the "actual" value of 21.26) in the notebook, however, locally, I get a value of 26.08... and the values are even more off when starting to resample, varying from +/-20%-50%.

For 14 periods, what's the rule of thumb for the amount of backdata needed? The only difference I can see in the notebook is that I have 780 lines of data (corresponding to about 2 days of minute-level data), in the notebook there is 2340 lines... not sure of why there's such a difference.

Then finally in the last part of the notebook, I'm not sure why I'm seeing nan when resampling to 30 minutes.

I'm sure it's a problem on my part of understanding, or if there's a library version difference somewhere, though I'm getting roughly the same results with the ta library. I'm seeing the same problems with trying to calculate a macd or slow stochastic... it must be something silly.

Thanks for your help!

7 responses

@Nabeel

ADX as a rule require twice as much bars of data as ADX period.
https://school.stockcharts.com/doku.php?id=technical_indicators:average_directional_index_adx

You can use

bfill()  

to fill in the missing data.

prices_30m = data.resample('5T', label='right', closed='right').agg({'high': 'max', 'low': 'min', 'price': 'last'}).bfill()  

@Vladimir, thanks for the reply. I think that makes sense for the resampling.

For the minute-level data, does 14 periods mean 14 minutes? If so, that's about 56 entries. But it doesn't looks like it requires about double that. Here's what I've been doing. I've taken tail(50) from the Quantopian get_pricing() data, which gives me a value of 36.90. Cool... then I take the last 50 from the downloaded data I have from IEX - I get the same result, 36.90. Just a sanity check, this makes sense.

I know the value is supposed to be about 32, so with Quantopian, I'm figuring out the minimum dataset size to get that value - which happens to be about 100 rows for a value of 32.52. But now I run my dataset locally, with 100 rows I'm getting the value of 36.42, not matching what's on Q. So maybe it's out of scope because I'm getting different values than what the notebook is telling me, but I'm a little lost now.

But I'm getting 36.92 consistently locally - down until I trim it below the period * 3 threshold of 42, it seems to settle around the correct value and it looks like past 100 rows or so, we're good for a ADX with a window-length of 14. I just can't figure out why

Using data.tail(rows):

Rows=45,  Q=38.02, local=38.02  
Rows=50,  Q=36.90, local=36.90  
Rows=55,  Q=35.37, local=35.37  
Rows=60,  Q=34.45, local=40.72   < this jumps up  
Rows=75,  Q=34.20, local=37.87  
Rows=100, Q=32.52, local=36.48  
Rows=150, Q=32.54, local=36.46  
Rows=300, Q=32.53, local=36.46  

Unfortunately since we can't download a CSV from Q, I can't validate every row, but it looks like from looking at head() and tail() that the data is the same

I'm comparing the 100 rows between the Q data and the IEX/Polygon data... it's all the same, yet both files are yielding different data. I'm so confused

@Nabeel

But it doesn't looks like it requires about double.

For ADX with a window-length (period) of 14 to get first value of ADX on any time frame you need at least 14*2 = 28 bars of time frame data.
For 1m time frame 28 bars of 1m data
For 5m time frame resampled from 1m data 5*28 = 140 bars of 1m data
For 30m time frame resampled from 1m data 30*28 = 840 bars of 1m data.

You can easy to test that in IDE.
Just replace

bars = period*timeframe*2  

for

bars = period*timeframe*1.99  

That makes sense, and I see nan reported with insufficient rows. But it doesn't explain the diverging values between two seemingly identical datasets, and I'm only seeing the "correct" value reported after about 100 rows.

I think I found the problem after plotting the data... looks like a single value for "high" which is much higher than the data around it. Ugh. Thanks for all of your help!

@Nabeel

I'm only seeing the "correct" value reported after about 100 rows.

ADX uses Wilder's MA which in terms of digital signal processing is an infinite impulse response filter (IIR).
Its value today substantially depends of starting point of data.
To make it mature in IDE just increase number of data points 4-8 times.