My calculation for Hull Moving Average is wrong

Back to Community

posted Jul 19, 2018

Hello Quantopian community,
I want to calculate the Hull Moving Average for 21 periods on the 1 hour chart. The formula is:

Calculate a Weighted Moving Average with period n / 2 and multiply it by 2
Calculate a Weighted Moving Average for period n and subtract if from step 1
Calculate a Weighted Moving Average with period sqrt(n) using the data from step 2

from quantopian.algorithm import calendars  
import talib as ta  
import pandas as pd  
import math

def initialize(context):  
    context.asset = sid(24)  # Stock ID  
    set_benchmark(sid(24))  
    for hours_offset in range(1, 8):  
        schedule_function(hma_mfi_strategy,  
                          date_rules.every_day(),  
                          time_rules.market_open(hours=hours_offset),  
                          calendar=calendars.US_EQUITIES,  
                          half_days=True)


def hma_mfi_strategy(context, data):  
    moving_average = 21

    dataset = data.history(context.asset, 'close', 60 * moving_average, '1m')  
    close = dataset.resample('60min').last().dropna()  


    wma_ta = (ta.WMA(close, moving_average / 2 ) * 2) - ta.WMA(close, moving_average) # Step 1, 2  
    hull_moving_average = pd.DataFrame(ta.WMA(wma_ta, math.sqrt(moving_average)))  # step 3  
    print(hull_moving_average.dropna())

As a result I get a couple of NaN's and 0's.

Two questions:
1. Am I resampling the timeframe for 1hr candles correctly?
2. Where is my mistake in calculating the Hull Moving Average?

4 responses

Jordan Belfort

Jul 19, 2018

I managed to fix the resampling of the time frame and now the dataframe displays correct 1 hour candles:

def  hma_mfi_strategy(context, data):  
    moving_average = 21

    dataset = data.history(context.asset, 'close', 60 * moving_average, '1m')  
    close = dataset.resample('60Min', base=30).last().dropna()

Output:

> close

Timestamp('2017-11-07 20:30:00+0000', tz='UTC'): 174.164  
Timestamp('2017-11-08 14:30:00+0000', tz='UTC'): 174.393  
Timestamp('2017-11-08 15:30:00+0000', tz='UTC'): 174.572  
Timestamp('2017-11-08 16:30:00+0000', tz='UTC'): 174.991  
Timestamp('2017-11-08 17:30:00+0000', tz='UTC'): 175.12  
Timestamp('2017-11-08 18:30:00+0000', tz='UTC'): 175.34  
Timestamp('2017-11-08 19:30:00+0000', tz='UTC'): 175.375  
Timestamp('2017-11-08 20:30:00+0000', tz='UTC'): 175.589  
...

Dan Whitnable

Jul 19, 2018

You may want to think how to handle potential NaNs in the data. Applying the 'dropna' method after resampling is needed because the resample method will insert all the non-market times into the data as NaN. However, if there happens to be a NaN in the data, this will also drop that row too. This will then create a series with length less than moving_average length. This shortened series will then return NaN from the Talib functions. Maybe consider using 'price' instead of 'close' in the 'data.history' method. This will forward fill the close values and reduce the chances of this happening.

Why are you resampling to 1 hour increments? This seems like a lot of information is being lost. Especially since the data is being averaged anyway it would seem one would want all the data points one can get? At the very least, maybe not resample with the 'last' method. With 60 minute increments, starting at 9:30 AM ET, one will loose the last (4:00 PM ET) data point. Maybe use 'mean' or 'median' instead?

Finally, what is the reason for turning the series returned by the talib WMA function into a dataframe? Wouldn't you simply want the last value of the series, a scaler value, as the Hull moving average?

Good work. The Hull moving average is much faster responding than a simple moving average.

Jordan Belfort

Jul 19, 2018

Dan,
Since I'm a beginner to the Quantopian platform, your response has been incredibly helpful to me and made me rethink the structure of my algorithm.

I'm resampling to 1 hour increments because my algorithm has to run on the 1 hour time frame. This means that for the 21 period moving average I have to look back 21 hours back; hence I request (after the changes you suggested)

data.history(context.asset, 'price', 60 * moving_average, '1m')

and then, resample it with

dataset.resample('60Min', base=30).last().dropna()

to look 21 periods(hours) back.

You are correct, and I eventually lose the last data point when the market closes. I don't understand though where I have to use 'mean' or 'median' as you suggested. Do you have any idea how I can correct this?

Again, thank you for the response.

Dan Whitnable

Jul 20, 2018

The resample method requires a "how" method. In the original this was the "last()" method which simply returns the last value in the resample window.

dataset.resample('60Min', base=30).last().dropna()

This can be changed to the "mean()" or "median()" methods (which returns the mean and median value of the sample window) like this

dataset.resample('60Min', base=30).mean().dropna()

Take a look at the pandas docs for info on the 'resample' method (https://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.DataFrame.resample.html ) The methods that can be applied however aren't documented well. Maybe look at this post which has a list of ones available http://benalexkeen.com/resampling-time-series-data-with-pandas/ .

You've successfully submitted a support ticket.

Our support team will be in touch soon.