Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Z Score - moving average

Hello all,

I'm having some trouble here.. If you take a look at the below, I'm calculating the number of stdevs the SPY is from its 120 mean. The final part that i'm trying to calculate is the moving average of "z." I coded something similar to this in Thinkscript and I seem to get a good signal when the z score crosses its own 7 day mavg. Any input? The only script i work with is VBA and the two couldn't be more different. Thanks in advance!

    context.security = symbol('SPY')


def handle_data(context, data):  
    mean = data[context.security].mavg(120)  
    current_price = data[context.security].price  
    sigma = data[context.security].stddev(120)  
    z = (current_price - mean) / sigma  
11 responses

@Jamie L.,

If you review this post: Fun with ZScore

You'll notice some ZScore code in there:

def HandleEntry(context, data):  
    closes   = history(DailyPeriods + 1, "1d", "close_price").resample("1w")  
    closes   = closes.dropna()  
    means    = closes.apply(talib.MA, timeperiod = WeeklyPeriods, matype = MAType.TRIMA)  
    sigmas   = closes.apply(talib.STDDEV, timeperiod = WeeklyPeriods)  
    zScores  = ((closes - means) / sigmas).iloc[-1]  

If you stop right before the .iloc[-1] and instead do this:

    zScores  = ((closes - means) / sigmas)  
    zScoresMA  = zScores.apply(talib.MA, timeperiod=7)  

What you'll have is both the raw zScores and a moving average of them.
Then you can loop your stock list and do your zScore comparison:

    for stock in data:  
        if (zScores[stock].iloc[-1] > zScoresMAs.iloc[-1] and zScores[stock].iloc[-2] <= zScoresMAs.iloc[-2]):  
            # recent zScore just crossed up over it's own MA.  

Note that the MA here includes the latest zScore which you might want to trim off before you perform the zScores.apply(talib.MA...

MT, this is great! Your example makes sense, I must have been over thinking this. I also didn't think to apply this to more than one security--I like the idea. I'll put the strat to work soon and repost if it's any good. Thanks again!

3/7/15 Update: altered strat to buy on 1.0 zscore cross with zscore MA > 1.0

You can play around with this strat if you like.

It uses a rather complex fundy selection algo, but you can swap in whatever you like there. The zscore calculation and the entry are separate in this strat. And it still uses a zMean threshold as a risk-off trigger.

MT

MT, thanks for all of your input on this.. i'm running into some trouble and probably in over my head. I think my first goal should be to simply record the two variables in question: z and zMA. I'd like to first visualize the crossover points--then figure out the trading logic.

When I use the below script, I'm unable to 'record' these variables because they are not returning as number values. Is this because they are currently in an array?

import talib  
import pandas

def initialize(context):

    symbols("SPY")

def handle_data(context, data):  
    pass

def HandleEntry(context, data):  
    closes   = history(130, "1d", "close_price")  
    closes   = closes.dropna()  
    means    = closes.apply(talib.MA, timeperiod = 120)  
    sigmas   = closes.apply(talib.STDDEV, timeperiod = 120)  
    z        = ((closes - means) / sigmas)  
    zMA      = z.apply(talib.MA, timeperiod=14)  


    record(zscore = z, zMA = zMA)  

How would i turn these two variables (z and zMA) back into number values so they can be charted?

Any guidance is much appreciated.

Thanks!
JL

@Jamie L., That question must be answered by your exploration of python. Python is easily the most misleading language out there. Return results from methods are never what they seem (initially). One looks at that equation:

    z = ((closes - means) / sigmas)  

and one thinks, "that z there is just a single number right?"

Your eyes will lie to you over and over while you learn what is truly being created and manipulated here. Below I take you through what is actually in that "z" object there. But to assist you (and myself) in understanding the objects in python I've recently been forcing myself to use this syntax with history:

    closeDeck   = history(15, "1d", "close_price").dropna(axis=1)  

as it reminds me that what comes back from history is way more complex than just a list of prices. It's a massive data object akin, in my mind, to a deck of cards. The word "dataframe" means nothing to me so I use the word "deck" as I visualize what's in that object "closeDeck" as a deck of cards, each card having a table of rows and columns in which the close prices are represented; [one card per SID (security) found in the data object that the history method used to build the closeDeck.]

If you want just one of the cards (containing the close prices for just one security) then you have to ask the deck for just that one card.

    closeDeck   = history(15, "1d", "close_price").dropna(axis=1)  
    spyCard      = closeDeck[context.SPY]  

But that one card is still not what you want, read on...

In addition when you manipulate decks and ask decks for things like means and stddevs and such, you may not get back just single numbers, one for each card (depending on what you asked the desk). No, what you'll generally get back is another stack of cards and on each card is a series, in this case a two column table, datetime and "value".

    sigmas = closeDeck.apply(talib.STDDEV, timeperiod = 14)  

will give you another deck of cards and on each card is a series of datetime, values with each value being the standard deviation up to that point in time as calculated by the talib.STDDEV.

Therefore, this call

    zScores    = ((closeDeck - means) / sigmas)  

if we look a the types used looks more like this:

    dataframe = ((dataframe - dataframe) / dataframe)  
or  
    zScoreDeck = ((closeDeck - meanDeck) / sigmaDeck)  

And if you debug your code and stop and take a look at zScoreDeck after that equation you'll find that zScoreDeck is a deck of cards (a dataframe), with each card, one per security, containing a time Series of values, with ONLY THE LAST one being what you are thinking is the "Z" score.

Bottom line, in order to get this last value from that zScoreDeck, you need to ask both the deck for the card AND ask the card for its last value

lastSPYZScore = zScoreDeck[context.SPY].iloc[-1]  

lastSPYZScore will be a single value that you can now plot.

Jamie, MarketTech's points are on. As a fellow TOS user/victim, I'll give a more specific example:

TOS:

def ma = ExpAverage(close, 10)

and then we say things like 'ma' to get the current value (a shorthand for ma[0]), and ma[1] for previous 1, ma[2] for second one back, etc.

In TOS, all or almost all variables are really array-of-double (double-precision floating point numbers)

In Q, most things are pandas data structures. You should read up on pandas, and play around with it in plain python scripts offline (the ipython shell is very useful for this). Many of those things which we would do as 'def ma = ...' in TOS are a pandas Series. Which is still basically a fancy-pants array with funny-looking indexing. Indexing in pandas is worthy of .... well, look online, you'll find plenty of pages on the many ways to do it and get very confused and delighted. That '.iloc[-1]' is basically TOS: 'ma[1]'.

Spend a few hours playing around with pandas offline, and you'll start to get the hang of it. The powerful thing you can do in pandas (and hence often in Q) is that you can do mathematical operations on whole arrays in one go, without a loop. e.g. data['this'] = 2 * data['that'] will get you a whol new array of values, 2X the value of 'that' in one go. 'zscores = (close - means) / sigmas' is an example of same. So zscores is an array (ok, its really a Series, but you get the idea....). Being an array, to get any one value at a particular point in time, you need to index into it. And it doesn't suppport the thinkscript shorthand of the name of the variable being the same as the most recent value. The most recent one is at .iloc[-1] (often.. but again, read up on the wonders of indexing).

Best of luck.

Thanks guys, I think the fog is starting to clear. MT, I like your deck of cards analogy--I had no idea that my calcs were being stored this way. Python is not the most intuitive language, for sure. Below is the TOS script I'm running as a way to trigger long/short signals--I'm currently applying it to EUR/USD and its generating decent long/short entry points.

input price = close;  
input length = 120;  
input z_length = 14;  
def average = Average(price, length);  
def stdev = standardDeviation(price, length);  
def z = (price - average)/stdev;  
def zma = average(z, z_length);

AddOrder(OrderType.BUY_AUTO, z crosses above zma, tickcolor = GetColor(0), arrowcolor = GetColor(0), name = "LongEntry");

AddOrder(OrderType.SELL_AUTO, z crosses below zma, tickcolor = GetColor(1), arrowcolor = GetColor(1), name = "ShortEntry");  

Its too bad TOS doesn't have a clean way to automate orders.. probably something to do with the TD takeover.

@Sol M., Regarding ZScore, absolutely. Technical analytic based single security trading systems are a fools errand. No matter what TA you apply, how much, how varied or how tricksy. I've proven this to myself over millions of valid backtests over hundreds of strategy types (each of which having thousands of scenario variations).

I especially liked the image in the linked reference, "Weekly underwater equity curve"

http://www.tradestation.com/education/labs/analysis-concepts/contrarian-zscore

My, what a telling chart! What trader will allow a system to spend YEARS underwater? All the profits come in narrow spurts. If you happen to start your strategy (like the Q's Open is starting theirs) at a "really bad time", you could see nothing but red for months and months. Who would stick with such a system?

The only possible means of avoiding such a beating might be in using a system like Quantopian to trade a basket of securities/ETFs using tools like the ZScore. And I do mean might.

So I've finally figured this out (or so it appears). The indicator isn't the best, unfortunately--but I think it has some value when combined with another strategy. As the saying goes: "if you're doing what everyone else is doing, you're doing it wrong." I'm moving this strategy into Matlab as I think its the easiest way for me to manipulate the code. In future editions, I'd like to remove the constants for my moving average lengths--i need a way to have this range adapt to volatility or convergence of the three moving averages..

Jamie,

You might note that Bollinger bands are basically I kind of z-score technique, in that (from http://en.wikipedia.org/wiki/Bollinger_Bands):

  • an N-period moving average (MA)
  • an upper band at K times an N-period standard deviation above the moving average (MA + Kσ)
  • a lower band at K times an N-period standard deviation below the moving average (MA − Kσ)

So, you could just re-cast the Bollinger bands in terms of z-scores.

Also, if you look into control charts from the domain of statistical process control, or SPC (http://en.wikipedia.org/wiki/Control_chart), you'll see a similar kind of application of z-scores.

So, if Bollinger bands or SPC worked universally for trading, we'd all be fat, dumb, and happy. My sense is that it is a handy way of normalizing data, though, to compare security A to security B over a short time scale, for example.

The problem, I reckon, is that in addition to "fat tails" you have statistical distributions that are not stable in time.

Grant