Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Python/pandas/math help, please -- Taper values of time series to reduce edge noise?

I need some help, since I don't know Python very well yet. I've got this custom factor that starts like this:

class MoneyMaker(CustomFactor):  
    inputs = [USEquityPricing.close]  
    def compute(self, today, assets, out, close):  
        asset_returns  = pd.DataFrame(close, columns=assets).pct_change()[1:]  
        market_returns = asset_returns[ sid(36486) ]  

I'm thinking when the edges of my lookback window truncate symmetrical movements it erroneously skews my results. To illustrate, say an asset has a significant dip and recovers -- if the dip occurs before the edge of my lookback window and the recovery occurs inside my lookback window, all I'm going to see is that the stock suddenly performed very strongly compared to the benchmark, which is misleading. In reality the stock is not performing any better than a few days ago when that symetrical movement was in the middle of the lookback window.

I think a solution to this problem is to taper the edges of the lookback window for both asset_returns and market_returns, so that as data moves towards the edges of the lookback window it will be minimized, and therefore won't skew the overall picture.

Am I correct in my assumptions? Will this produce less jerky day-to-day linear recursion results?

What's the best way to code this? Is there any benefit to sinusoidal tapering over linear ramp?

Or is there an altogether better solution?

11 responses

Example average of the last 4 over the first 4 you can adapt in case it might apply.

import numpy as np

class Momentum3(CustomFactor):  
    inputs = [USEquityPricing.close] ; window_length = 21  
    def compute(self, today, assets, out, close):  
        wndw = 4  
        out[:] = np.mean(close[-wndw:], axis=0) / np.mean(close[:wndw], axis=0)  

I'm afraid my python skills aren't good enough to adapt it.

asset_returns = np.mean(asset_returns[ .... I'm totally lost  

I made an image to illustrate what I mean:

The first chart is normal returns values like what I'm working with. The second and third charts are potential examples of what I want to end up with. (Please ignore that in the third image the time axis got inadvertently mangled -- time remains linear in this scenario.)

Thanks, Grant. Seems a suitable hammer for this nail. I still need some hand-holding with the Python/pandas though.

This is what I'm trying:

from scipy import signal  
import numpy as np  
# etc...

class TaperedWindowRecursion(CustomFactor):  
    inputs = [USEquityPricing.close]  
    def compute(self, today, assets, out, close):  
        # Returns  
        asset_returns   = pd.DataFrame(close, columns=assets).pct_change()[1:]  
        market_returns  = asset_returns[ symbol('SPY') ]

        # Window function creates an ndarray the length of the lookback window  
        kaiser_beta     = 3  
        window_function = signal.kaiser( self.window_length, kaiser_beta )

        # Multiply the returns by the kaiser signal - This is the part I don't know how to do  
        asset_returns   *= window_function # ??? how to do this?  
        market_returns  *= window_function # ??? how to do this?

        # Linear recursion  
        A = np.vstack([market_returns, np.ones(len(market_returns))]).T  
        m, b = np.linalg.lstsq(A, asset_returns)[0]

        # etc...  

How do I multiply my returns values by the values generated by the window function? (I don't yet have a good grasp of how to work with all the different datatypes in Python.)

@ Viridian Hawk -

Could you explain again what you are trying to do? I've read your original post several times, and I'm confused. What problem are you needing to solve with the window function?

I'm trying to reduce the effects of noise on the edges of the time series look-back window when running linear recursions. The "noise" I'm specifically worried about are dips and pops where, for whatever reason, a stock will temporarily have an dramatic idiosyncratic movement upwards or downwards before mean reverting.

Put another way, I think we've all experienced looking at a 3-month chart of a stock and thinking to ourselves "this has been performing pretty well!" and then look at the 1yr chart and think "no, actually this hasn't been performing well at all." Where you cut off the returns window can give wildly different impressions of a stock's performance. What I want to do is de-emphasis the data near the edges of the window, to help eliminate the effects of this "framing bias."

Also, unlike other datasets you might run a linear recursion on where values are independent of each other, for stock market returns I believe the order of the values holds significance, and so truncating the data means you've removed something meaningful to the remaining data that it was nearest to.

Lets say for the sake of simplicity that my look-back window is 20 and my daily gains are: -0.0909, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 So it's basically just flat, with a symmetrical "dip" at the front of the dataset. (Keep in mind these values are daily % gain, which is what you use to calculate alpha and beta.)

On the first day my data would look like this: -0.0909, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 So it's basically flat returns.

On the second day my data would look like this: 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 Suddenly it's significant positive returns.

On the third day, my data would look like this: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 So again, flat returns.

So what happened was a dip at the edge of the look-back window caused a significantly changed picture of the over-all data through time. When calculating alpha and beta for the stock, the values are going to jump around quite a bit, not due to any change in performance of the stock, rather due to the arbitrary length of my look-back window and depending on what kinds of movements are getting clipped.

So to solve this, I want to multiply the values of my time series by a series of values produced by the window function you suggested. This will (I think) de-emphasize the truncated movements at the edges of the look-back window, thus creating more useful alpha and beta results. The results will be smoother through time as well, which will help reduce spurious edge-noise-induced rebalancing.

Lets say I use a triangle window function that returns the following values: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1.

When applied to the first day in the example above, I would now have 0.009, 0.002, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 Slightly positive, but overall-flat.

When applied to the second day in the example above, I would now have 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 Eessentially flat.

The alpha calculation on this ramped dataset will produce a value that from day to day moves smoothly instead of wildly jumping around.

Basically the trouble I'm having is I just don't know the correct way to multiply the array produced by the window function (signal.kaiser( self.window_length, 3 )) with the asset_returns pandas dataframe (pd.DataFrame(close, columns=assets).pct_change()[1:])

To multiply the array produced by the window function (signal.kaiser( self.window_length, 3 )) with the asset_returns pandas dataframe (pd.DataFrame(close, columns=assets).pct_change()[1:]) , simply to use the Pandas dataframe method 'multiply'. Fortunately, you have the returns data in a dataframe so the 'multiply' method can be used. Doing this with numpy is a little less straightforward.

        asset_returns   = pd.DataFrame(close, columns=assets).pct_change()  
        market_returns  = asset_returns[ symbol('SPY') ]

        weighted_asset_returns = asset_returns.multiply(window_function, axis=0)  
        weighted_market_returns = market_returns.multiply(window_function, axis=0)  

See attached notebook.

You can also use the Pandas .rolling function directly.
I modified one cell of @Dan's notebook to show how.

X=prices.pct_change()  
print(my_security[1])  
#Get win_types from  
# https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.rolling.html

X['triang_appl']  =X[my_security[1]].rolling(window=5,win_type='triang' ).mean()  
X['blackman_appl']=X[my_security[1]].rolling(window=5,win_type='blackman' ).mean()  
X['hamming_appl'] =X[my_security[1]].rolling(window=5,win_type='hamming' ).mean()  
X['parzen_appl']  =X[my_security[1]].rolling(window=5,win_type='parzen').mean()  
X['gaussian_appl']=X[my_security[1]].rolling(window=5,win_type='gaussian' ).mean(std=0.1)  
X['kaiser_appl']  =X[my_security[1]].rolling(window=5,win_type='kaiser' ).mean(beta=60.0)

appl_plus_win_types = [my_security[1], 'triang_appl','blackman_appl','hamming_appl','parzen_appl', 'gaussian_appl','kaiser_appl']  
X[appl_plus_win_types].plot()  

Wow, thanks so much, guys. I just dropped this into a couple of my algorithms, and I'm already seeing universally improved results. Looks like my concerns about edge noise were justified. Looking forward to tinkering around with this some more and seeing what else I can do with it. Cheers!

Here's the full customfactor, in case anybody reads this thread and wants it.

from scipy import signal

class TaperedAlphaBeta(CustomFactor):  
    inputs = [USEquityPricing.close]  
    outputs = ['alpha','beta']  

    def compute(self, today, assets, out, close):  
        asset_returns  = pd.DataFrame(close, columns=assets).pct_change()[1:]  
        market_returns = asset_returns[ sid(8554) ]  
        kaiser_beta     = 3  
        window_function = signal.kaiser( self.window_length-1, kaiser_beta )  
        weighted_asset_returns = asset_returns.multiply(window_function, axis=0)  
        weighted_market_returns = market_returns.multiply(window_function, axis=0)  
        A = np.vstack([weighted_market_returns, np.ones(len(weighted_market_returns))]).T  
        beta, alpha = np.linalg.lstsq(A, weighted_asset_returns)[0]  
        out.alpha[:] = alpha  
        out.beta[:]  = beta  
   taperedAlpha, taperedBeta = TaperedAlphaBeta(window_length=160 )  

I'm not exactly sure how you'd mask it. I tried mask=(StaticSids([8554]) or QTradableStocksUS()) but that gave me an error.