Recursive Function calls for Price data series (... python help please)

Back to Community

edited

Hi python experts. I know that for many of you this will be a trivially (and probably laughably) easy question, but i am forced to admit that my own python skills are just not up to it, and so i need to reach out for your help, with all due humility on my part. Here's my problem...

In my own trading outside of Q, i make a lot of use of digital signal processing (DSP) methods in which recursive function calls are applied to price data series. I'm sure you know the sort of stuff; Ehler's books are full of these types of things. In general mathematical (not pythonic) notation, at each price data point P(i) for i = 0, 1, 2,... the result, Result(i) = some algebraic function of the previous result, Result(i-1). Including the initialization condition, this is easy for me to write and execute in most languages as something like the following pseudo-code:

function (PriceArray, parameter(s) e.g. Nperiod)
{ result[0] = Price[0] ... or some other function of initial price
define my_function coefficients here ... for example k = 2/(Nperiod +1) in the case of N-period EMA of Price.
loop for i = 1 to imax
result[i] = my_function (result[i-1]) ... for example result[i] = k*Price[i] + (1-k)*result[i-1] .... in the case of EMA.
}

Its easy for me to do in OTHER languages, but I am embarrassed to say that I have not been successful in implementing this type of recursive function here in the Q / python context. I'm not sure, but i think it requires 2 steps, namely a "class" function to handle the data array part, preceded by a "helper" function to define the actual user function itself. The following is my attempt (illustrated by the standard "hello world" example of an EMA), but i have not been able to get it to work. My aim here is NOT actually that i want to get an EMA (yes, i know its available in TAlib) but rather to understand the correct mechanics of "pythonizing" these sorts of functions. Please, without re-writing it all completely, can someone show me what corrections are required to make THE FOLLOWING = my attempted Q / python code snippet workable? (Apologies about the non-visibility of the indentation which is not showing up here).

# the "helper" function
recursionStarted = False
previousResult = 0.0
def my_recursive_function (priceArray, periodN):
global recursionStarted
global previousResult
if not recursionStarted:
initialResult = priceArray[0]
previousResult = initialResult
recursionStarted = True
# else if recursionStarted
k = 2.0 / (periodN + 1) # Using EMA here as an example only.
result = (k* priceArray[-1]) + (1-k)* previousResult
previousResult = result
return result

# the main "class" function as required in Q algos
class deltaClosePct_vsRecursivePrice (CustomFactor):
inputs= [USEquityPricing.close]
window_length = 252
global Nbars
Nbars = 21
def compute(self, today, assets, out, close):
functionVal = my_recursive_function(close, Nbars)
out[:] = 100*(close[-1]/functionVal[-1] -1) # calculating % difference between closing price and the "filter function" value (here EMA).

Thanks in anticipation, best regards, TonyM.

10 responses

Tony Morland

@Grant, i know you have done something like this. Can you give me a little guidance? Anyone else?

Grant Kiehne

Hi Tony -

I tried it outside of Pipeline. I’ll take a look at what you posted. You may not need to do the recursive stuff. One potential issue is that it will have a start-up period before reaching steady state, which you could avoid by grabbing a trailing window of data and operating on it.

Alan Coppola

@Tony,
Nice to see you back!

Even though the Python mavens say that the way you're using recursion in this application is slow (use arrays)...see @Grant above...here you go...
I've enclosed a notebook.

The main problem is that close is a window_length x len(assets) array of numbers for each day.
Your recursive function needs to work on a 1d-array. So, see cell [2] array_recursive_function, and check that it gives you what you want.

Then, when putting it into the pipeline, it needs to apply to a 1d-array, unless it is vectorized to apply over all the assets at once( out[:] = ...)
I didn't do that, you probably could.

I also needed to reset your global variables between each run of the recursive function on each window_length of data.
Finally, I restricted it to just a few assets, as it takes too long, and sometimes produces "Nan"'s. I didn't check any of the "answers" for this exercise, so buyer-beware.

Also, check out: https://www.quantopian.com/posts/pipeline-mask-equals-marketcap-top-500

alan

Grant Kiehne

Note that there is a built-in EWMA function in Pandas:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html#pandas.DataFrame.ewm

If you are just needing to smooth price data, this should do the trick (although I'm not quite sure how to use Pandas functions within Pipeline, but there must be a way).

I don't see any advantage to trying to do things recursively in this context.

Tony Morland

Hi @Grant,
Thanks for your reply. However the point I'm trying to make (although evidently unsuccessfully so far) is that it is NOT specifically EMAs that I care about at all, but rather that
a) Digital Filters of various kinds are extremely useful in Signal Processing applications in general,
b) There exists an extensive body of Engineering literature regarding DSP functions and their filter characteristics, and many of these are also usefully applicable to price series analysis in general, not only low-pass filter smoothing (such as EMA),
c) Such recursive filters are very useful in practice and can not only accomplish smoothing but also remove phase lag (which is the bugbear of EMA's and most conventional smoothers), as well as facilitating all sorts of other calculations such as Hurst exponents, and various types of efficient adaptive filters.
d) Filters based on recursive functions are easy to write in other languages but I am finding the conversion to python to be non-trivial (for me, even though it is probably quite easy for a python expert).

Your comment that you: "don't see any advantage to trying to do things recursively in this context" is applicable in the special case of (simple) moving average filters that can also be defined non-recursively, but is not applicable to more general classes of filters where the output response depends not only on the input but also on previous values of output, i.e. inherently involves recursion.

Regarding your comment about potential for problems of the initial start-up before steady state, yes indeed that is true, but is easily taken care of simply by appropriate choice of the initialization value and is not really part of the "python" problem per se.

Hi @Alan good to see you again and thanks for your comments.
Yes, other "Python mavens" have also told me that the way that I want to use recursion is slow, and no doubt that is true, but at the moment I am still at the "baby steps" stage of trying to take something that I know works well in a non-python context and now get it to work (slowly or not) in a python context. In this way I want to effectively translate some useful ideas from elsewhere into python/Q and get them working here, even if I'm not yet doing it the most "efficient" way.

I'm taking a look at your notebook now and will return comment ASAP.

@Grant, @Alan,
In the meantime, to get this right away from the topic of EMA and onto to something more general, but nevertheless still using a simple concrete example, imagine you wanted to create a filter such as a simple 2-pole bandpass filter with transfer function which can be written to express the output as:

Output(N) = C1*(Price(N) - Price(N-2)) + C2*Output(N-1) + C3*Output(N-2)

where C1, C2, C3 are constants, Price(N) is the latest Price input, Price(N-2) is the price 2 bars earlier, and Output(N) is the output at latest bar (N) and this depends on earlier values of input Price and also on the previous values of Output from 1 and 2 bars earlier.
How would you do this in python / Q context?

Best regards,
TonyM.

Grant Kiehne

Hi Tony -

I'll see if I can cook up a Pipeline custom factor for:

Output(N) = C1*(Price(N) - Price(N-2)) + C2*Output(N-1) + C3*Output(N-2)

I think the easiest thing to do would be to roll through a trailing window of price deltas, every time the custom Pipeline factor is called. This means writing a loop, but it avoids having to deal with globals and drop/add of securities. It also avoids the start-up issue.

Grant Kiehne

Here's what I came up with (not sure it is correct yet, but it is a start). I've attached an Alphalens notebook, showing it doesn't barf.

       class recursive(CustomFactor):  
            inputs = [USEquityPricing.close]  
            window_length = 252  
            def compute(self, today, assets, out, close):  
                c1 = 1  
                c2 = 0.1  
                c3 = 0.1  
                p = close  
                p_lag = np.roll(p,2,axis=0)  
                delta_p = p - p_lag  
                delta_p = delta_p[2:,:]  
                output = np.zeros_like(delta_p)  
                for k in range(0,len(delta_p[:,0])):  
                    if k == 0:  
                        output[k,:] = c1*delta_p[k,:]  
                    elif k == 1:  
                        output[k,:] = c1*delta_p[k,:] + c2*output[k-1,:]  
                    else:  
                        output[k,:] = c1*delta_p[k,:] + c2*output[k-1,:] + c3*output[k-2,:]  
                out[:] = output[-1,:]

Tony Morland

Hi @Grant, thanks for your efforts on this, which I am just going through now and will get back to you about ASAP

@Grant, @Alan, I very much appreciate the fact that you both made the effort in doing this to help me, perhaps without fully understanding why I wanted it. I trust the following will adequately explain the practical application the example, as well as giving you an appetite for what else is possible.

There are many very useful digital filters that can be constructed, some of which have excellent characteristics that are potentially applicable to trading, including for example a) VERY short lags, thereby overcoming the usual problem inherent in long-period (i.e. very smooth) moving averages, b) minimizing phase distortion (which occurs with EMA's), and c) the ability to generate ZERO-lag smoothing of data that contains known cyclical components, e.g. for oscillators, etc. Electrical Engineering literature is rich with examples of all sorts of applications, some of which can be extended to trading. The example I have given:
Output(N) = C1*(Price(N) - Price(N-2)) + C2*Output(N-1) + C3*Output(N-2)
is in fact a band-pass filter (BPF) which has the following interesting characteristics: Sharp attenuation both of high frequency (= short-period, i.e. noise) components above the passband and also of low frequency (= long-period, i.e. trend or drift) components below the passband. The values of coefficients C1, C2, C3 depend of the chosen period for the filter, as well as its "Q-factor" = ratio of pass-band width to dominant period. In the frequency domain, the BPF works a lot better than simply concatenating a low-pass filter and a high pass filter, and has the added benefit of minimal distortion within the passband itself and sharp rolloff (attenuation) on either side. If there are dominant frequency (i.e. periodicity) components in the input data (price series) and the BPF is tuned to that periodicity, then with the BPF we potentially have an almost ideal oscillator (in the TA sense of that word) with no lag, and without the usual oscillator problems of saturating during trends. The problem of course is that for actual price data series the frequency spectrum is non-stationary and so the dominant frequency (if any) and phase drifts with time. However there are also a number of ways to construct digital filters, as well as some python scientific library tools to extract the dominant frequency information itself, and this can be fed as input the BPF. A few decades ago, John Ehlers made a name for himself (and plenty of money) with his Maximum Entropy Spectral Analysis (MESA) work, which was originally based on ideas from digital signal processing (DSP) as used in geophysical exploration. Ehlers also did a lot of work in other forms of DSP applied to trading, which is reasonably well documented in his books which are quite easy to understand although most of them do contain some typo errors in his formulas. Ehlers work is conceptually sound from an Engineering / Signal Processing perspective. Ehlers coded up his filters in TradeStation Easy Language, which is identical to that used nowadays in MultiCharts, fairly similar to AmiBroker's AFL array processing language, and also easy to translate into the syntax & structure of any C-type languages. An Internet search reveals that for several years now there have been quite a large number of people asking how to translate these codes into python, but (to date) almost no satisfactory answers. Hence my questions now about writing these types of functions in pythonic / Q form. There exists a wealth of interesting tools available, if one can correctly "pythonize" them. This is a potentially rewarding area!

Tony Morland

Hi @Alan,
In your notebook, line as shown below
myPipe=run_pipeline(pipe, start_date=start, end_date=end).dropna()

i notice the .dropna() that you have added at the end in order to omit any NaNs.
As a separate question (unrelated to this filter example and in a different context), in another of my algos I have been trying to replace NaNs with some default value (e.g. zero) but I keep getting error messages as apparently there is something I don't understand about the required syntax. Please can you show me, in your notebook example, how you would replace NaNs with some default value, rather than omitting those entries? Thanks in advance, TonyM.

Grant Kiehne

Hi Tony -

Note that Scipy has a entire library devoted to this topic:

https://docs.scipy.org/doc/scipy/reference/signal.html

Specifically, see:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.lfilter.html

This can be used to do the same thing I did above, since:

The filter function is implemented as a direct II transposed structure. This means that the filter implements:

a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[M]*x[n-M]
- a[1]*y[n-1] - ... - a[N]*y[n-N]

So, as an exercise, you could try applying scipy.signal.lfilter and compare it to what I posted above. You should be able to get the same result, and then the question would be which is more computationally efficient. My guess is that scipy.signal.lfilter will be faster.

You've successfully submitted a support ticket.

Our support team will be in touch soon.