Notebook

Wilder MA in notebook vs algorithm

In [39]:
# First get some imports we will be using out of the way
import numpy as np
import pandas as pd

# The basic pipline methods
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
    
# Datasets we will use
from quantopian.pipeline.data.builtin import USEquityPricing

Wilder's MA

Start with a function to calculate the Wilder EMA.

The formula is EMA = price today K + EMA yesterday (1-K)

(where K =1/N and N = the number of periods)

In [40]:
def WildersMA(closes, length):
    wma = np.zeros(len(closes))
    k = 1.0/length
    
    for i in range(0, len(closes)):
        if i < 1:
            wma[i] = closes[i]
        else:
            wma[i] = closes[i]*k + np.roll(wma, 1)[i]*(1-k)

    return wma            
In [41]:
# Now lets set up some parameters
# Define the start and end dates (which match the backtest)
start_date = '2019-01-18'
end_date = '2020-01-02'

# Set our exponential weighting constant used to set k in the function above
ma_length = 200
In [50]:
# Get daily pricing for SPY.
day_pricing = get_pricing('SPY', start_date=start_date, end_date=end_date, fields='close_price')
day_pricing.head(5)
Out[50]:
2019-01-18 00:00:00+00:00    261.469
2019-01-22 00:00:00+00:00    258.044
2019-01-23 00:00:00+00:00    258.417
2019-01-24 00:00:00+00:00    258.712
2019-01-25 00:00:00+00:00    260.881
Freq: C, Name: Equity(8554 [SPY]), dtype: float64
In [52]:
# Now calculate the WilderMA for these SPY prices and turn into a series
wma = WildersMA(day_pricing, ma_length)
wma = pd.Series(data=wma, index=day_pricing.index)
wma.head(5)
Out[52]:
2019-01-18 00:00:00+00:00    261.469000
2019-01-22 00:00:00+00:00    261.451875
2019-01-23 00:00:00+00:00    261.436701
2019-01-24 00:00:00+00:00    261.423077
2019-01-25 00:00:00+00:00    261.420367
Freq: C, dtype: float64
In [54]:
# Now lets put both our prices and the wma into a single dataframe
pricing_df_ver1 = pd.DataFrame()
pricing_df_ver1['price'] = day_pricing
pricing_df_ver1['wma'] = wma
pricing_df_ver1.head(5)
Out[54]:
price wma
2019-01-18 00:00:00+00:00 261.469 261.469000
2019-01-22 00:00:00+00:00 258.044 261.451875
2019-01-23 00:00:00+00:00 258.417 261.436701
2019-01-24 00:00:00+00:00 258.712 261.423077
2019-01-25 00:00:00+00:00 260.881 261.420367

Plotting

Let's plot the WilderMA of SPY vs the price of SPY

In [71]:
pricing_df_ver1.plot();

Does this look like the plot generated in the backtest?

Not really. The wma is much smoother. If you look closely some of the values aren't the same either. What's going on?

Let's start by looking at the calculation made in the backtest on the first day and compare it to the calculation here in the notebook.

The backtest executes these lines of code

hist = data.history(context.spy, 'close', context.ma_length + 40, '1d') myWMA = WildersMA(hist, context.ma_length) record(WMA = myWMA[-1])

The value for context.ma_length is the same as the ma_length here in the notebook which is 200. So, the history method fetches 200+40 days of previous pricing data. The algo then goes on to call the WildersMA method with this 240 days of pricing data ON THE VERY FIRST DAY of the backtest.

Difference 1

That's the first big difference. The backtest accounts for 240 days of history in calculating the first wma value while the notebook here only uses the starting value. The algo begins with a fixed lookback while the notebook has no lookback to begin with.

Difference 2

Now consider what happens on the second day of the backtest. It still passes the previous 240 days of history to wma function. However, it excluded that first day which WAS included in the first iteration. The lookback window is fixed and doesn't expand. In pandas terminology this is a 'Rolling' lookback window.

Consider now how the wma value is calculated on teh second day in the notebook. It looks at 2 days of values and, moreover the initial day doesn't change. In pandas terminology this is an 'Expanding' lookback window.

Difference 3

The notebook get_pricing method adjusts all the prices as of the end date. Effectively, the start_date prices are adjusted with future information (ie as of the end_date). The backtet doesn't have this ability. The prices it uses are the actual un-adjusted prices. This doesn't impact the plot much but it could if there had been large dividends or a stock split.

How can we get them to match?

The first thing to recon with is getting the same data in the notebook that the backtest has. One way to do this is by using the 'start_offset' parameter in the get_pricing method.

In [56]:
# Below will fetch the same data that the algorithm fetches on day one of the backtest 
spy_pricing = get_pricing('SPY', fields='close_price', 
                               start_date='2019-01-18', end_date='2019-01-18', 
                               start_offset=ma_length+40)
# Need to exclude the day of the backtest so clip the last row of data
backtest_pricing = spy_pricing.iloc[:-1]
backtest_pricing.head(5)
Out[56]:
2018-02-05 00:00:00+00:00    259.256
2018-02-06 00:00:00+00:00    264.173
2018-02-07 00:00:00+00:00    262.662
2018-02-08 00:00:00+00:00    252.917
2018-02-09 00:00:00+00:00    256.813
Freq: C, Name: Equity(8554 [SPY]), dtype: float64
In [57]:
backtest_pricing.tail(5)
Out[57]:
2019-01-11 00:00:00+00:00    258.91
2019-01-14 00:00:00+00:00    257.46
2019-01-15 00:00:00+00:00    260.38
2019-01-16 00:00:00+00:00    260.95
2019-01-17 00:00:00+00:00    262.98
Freq: C, Name: Equity(8554 [SPY]), dtype: float64
In [59]:
# One could go into the backtest and verify these are the first and last SPY prices 
# Now pas these prices to our wma function
notebook_wma = WildersMA(backtest_pricing, ma_length)
notebook_wma[-1]
Out[59]:
266.82641676542852
In [60]:
# Looking at the log of the algo, this is indeed the very first wma value recorded
In [19]:
# Let's create a small function which returns the wma value the algo would see on a given date
# Similar to what we just did above
def get_backtest_wma(backtest_date, ma_length):
    """
    Return the wma that the backtest would have calculated on a specific date
    Note the 40 offset adder was hardcoded in the backtest
    """
    spy_pricing = get_pricing('SPY', fields='close_price', 
                               start_date=backtest_date, end_date=backtest_date, 
                               start_offset=ma_length+40)
    # backtest prices do not include the current day so drop the last date
    backtest_pricing = spy_pricing.iloc[:-1]
    wma = WildersMA(backtest_pricing, ma_length)
    
    # return just the latest value which is also what the backtest does
    return wma[-1]

Use this function to loop through the backtest dates

Calculate the wma just like the backtest does on each date

Create a series which would be the wma values the backtest would create. First we need to create a list of dates which the backtest would run. There are several ways to do this but simple hack is to run pipeline and use those dates. Pipeline reterns values for trading days which are the same as backtest days.

In [17]:
# Make a pipline with no columns. The level 0 index will be the dates.
pipe = Pipeline(columns = {})
pipe_data = run_pipeline(pipe, start_date=start_date, end_date=end_date)
datelist = pipe_data.index.get_level_values(0).unique()
datelist

Pipeline Execution Time: 0.90 Seconds
Out[17]:
DatetimeIndex(['2019-01-18', '2019-01-22', '2019-01-23', '2019-01-24',
               '2019-01-25', '2019-01-28', '2019-01-29', '2019-01-30',
               '2019-01-31', '2019-02-01',
               ...
               '2019-12-18', '2019-12-19', '2019-12-20', '2019-12-23',
               '2019-12-24', '2019-12-26', '2019-12-27', '2019-12-30',
               '2019-12-31', '2020-01-02'],
              dtype='datetime64[ns, UTC]', length=241, freq='C')

Great. The above dates look like they are correct. They represent the dates that the backtester would run.

Now let's calculate the wma for each date (as the backtest would see) and add the values to a series so we can plot it later.

In [61]:
wma_series = pd.Series()

for backtest_date in datelist:
    wma_value = get_backtest_wma(backtest_date, ma_length)
    wma = pd.Series(data=wma_value, index=[backtest_date])
    wma_series = wma_series.append(wma)
    
wma_series.head(5)
Out[61]:
2019-01-18 00:00:00+00:00    266.826417
2019-01-22 00:00:00+00:00    268.300906
2019-01-23 00:00:00+00:00    267.820314
2019-01-24 00:00:00+00:00    264.871447
2019-01-25 00:00:00+00:00    266.035065
dtype: float64

Let's see how this plot looks

In [62]:
# Now lets put both our prices and the wma into a single dataframe
pricing_df_ver2 = pd.DataFrame()
pricing_df_ver2['price'] = day_pricing
pricing_df_ver2['wma'] = wma_series
pricing_df_ver2.head(5)
Out[62]:
price wma
2019-01-18 00:00:00+00:00 261.469 266.826417
2019-01-22 00:00:00+00:00 258.044 268.300906
2019-01-23 00:00:00+00:00 258.417 267.820314
2019-01-24 00:00:00+00:00 258.712 264.871447
2019-01-25 00:00:00+00:00 260.881 266.035065
In [70]:
pricing_df_ver2.plot();

Much better. The wma isn't so smooth now and looks much more like the algo.

However, there still is a bit of mis-match. The wma for June is above the price here. However, in the algo it looks like they are about equal?

Ah, we are still using the adjusted prices for spy. Let's see if we can get the prices which the backtester would have seen.

In [65]:
# Make a pipline to get SPY prices which the backtester would have seen.
# The get_pricing method adjusts prices to the last date which would be in the future in a backtest

spy_pipe = Pipeline(columns={'close_price': USEquityPricing.close.latest})
pipe_data = run_pipeline(spy_pipe, start_date=start_date, end_date=end_date)
spy_data = pipe_data.xs(symbols('SPY'), level=1)
spy_data.head(5)

Pipeline Execution Time: 0.53 Seconds
Out[65]:
close_price
2019-01-18 00:00:00+00:00 262.98
2019-01-22 00:00:00+00:00 266.42
2019-01-23 00:00:00+00:00 262.93
2019-01-24 00:00:00+00:00 263.31
2019-01-25 00:00:00+00:00 263.61
In [67]:
# Now lets re-plot our SPY vs mwa using SPY prices that the backtester would see
# Now lets put both our prices and the wma into a single dataframe
pricing_df_ver3 = pd.DataFrame()
pricing_df_ver3['price'] = spy_data.close_price
pricing_df_ver3['wma'] = wma_series
pricing_df_ver3.head(5)
Out[67]:
price wma
2019-01-18 00:00:00+00:00 262.98 266.826417
2019-01-22 00:00:00+00:00 266.42 268.300906
2019-01-23 00:00:00+00:00 262.93 267.820314
2019-01-24 00:00:00+00:00 263.31 264.871447
2019-01-25 00:00:00+00:00 263.61 266.035065
In [69]:
pricing_df_ver3.plot();

That looks pretty good.

Now the notebook and algo versions of WMA look the same.

The key issues were:

  1. The notebook had no initial lookback data while the algo did (240 days of additional lookback).
  2. The notebook used an 'expanding' window of lookback data (keeping the start date fixed) while the algo had a fixed length 'rolling window' of lookback.
  3. The notebook used SPY pricing which was adjusted as of the end_date. The algo used data which was adjusted as of each simulation day (similar to pipeline data).
In [ ]: