Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Price of Security 'n' Days Ago from Within Pipeline

Hello,
I am fairly new to Algorithmic trading and Python, but I am not new to trading and coding in general. I have a personal program in C++ that calculates the RMSE(Root Mean Square Error) between a graph of actual historical prices and a modified Moving Average. I look for the smallest RMSE valued stock to invest in for the week. Now that you have an idea of my process, I am trying to use the Pipeline to screen for stocks with a low RMSE. To use the RMSE equation, I must use actual security pricing values from 'n' days ago. I can't seem to find any way within Quantopian to find 'USEquityPricing.close' at exactly 'n' number of days ago, so that I can use a for loop to cycle though all 'n's and get a summation RMSE. If there is a method for this, I hope to use it in the future to screen for greatest percentage gainers and stocks hitting new 52 week highs. If anyone can help me with this issue, I would greatly appreciate it.
Thank You!

9 responses

James,

You can get any data that Quantopian provides (technical, fundamental, etc) for any dates by using factors. There are some built in factors but it's also rather simple to write a custom factor if none of the built in ones are what you need.

I've attached a notebook showing a custom factor which takes the past n days of close prices as an input and then outputs the close price from the nth day. You could also do other calculations inside the custom factor if you wish.

You alluded to using a for loop to cycle through all the close prices n days ago and do some calculation. While that's possible in Python, it is more "Pythonic" to perform calculations over an entire array in one statement. The output of a pipeline is actually a Pandas dataframe object which gives you a number of builtin methods making array handling very easy. I've given some examples in the attached notebook.

Good luck,

Dan

Thank you very much Dan!
I've implemented the process of your Custom Factor Class in my 52 week high algorithm, but I seem to be receiving errors when the Pipeline integrates the column and screen data. I get an "AttributeError: 'int' object has no attribute 'ndim'" error when compiling. I know that the "new52WeekHigh" data is in an unacceptable format for the Pipeline and my the solution may be as simple as defining my own factor to process the new52WeekHigh data. If that is the case, I'm not sure how to create a factor that calls another factor (calling 'CloseOnN' from within 'new52WeekHigh'). I may be making a stupid mistake or missing something very fundamental, but I do not fully understand how the Pipeline works. By the way, I know what you mean by performing the calculation over a single array after population instead of using a loop, and I plan to simplify the code later by doing just that.
Again, thank you very much,
James

from quantopian.pipeline import Pipeline, CustomFactor  
from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline.data.builtin import USEquityPricing  
from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume, Latest

class CloseOnN(CustomFactor):  
inputs = [USEquityPricing.close]  
window_length = 1  

def compute(self, today, assets, out, close):  
    out[:] = close[0]  
def initialize(context):

recentHigh = CloseOnN(window_length = 252)  
pastHigh = CloseOnN(window_length = 7)  
new52WeekHigh = 0  

index = 251  

while index > 7:  
    if (CloseOnN(window_length = index)) > pastHigh:  
        pastHigh = CloseOnN(window_length = index)  
        index = index -1  
while (index > 0) and (index < 8):  
    if (CloseOnN(window_length = index)) > recentHigh:  
        recentHigh = CloseOnN(window_length = index)  
        index = index -1  

if recentHigh > pastHigh:  
    new52WeekHigh = 1  
else:  
    new52WeekHigh = 0


volume = Latest(inputs=[USEquityPricing.volume])  
price = Latest(inputs=[USEquityPricing.close],window_length=1)  
pipe_screen = ((price > 1.0) & (price < 20.0) & (new52WeekHigh > 0))

pipe_columns = {'volume':volume, 'price':price, 'new52WeekHigh':new52WeekHigh}

pipe = Pipeline(columns=pipe_columns,screen=pipe_screen)  
attach_pipeline(pipe, 'test')


def before_trading_start(context, data):  
output = pipeline_output('test')

context.my_securities = output.sort('volume', ascending=False).iloc[:50]  
print len(context.my_securities)

context.security_list = context.my_securities.index

log.info("\n" + str(context.my_securities.head(5)))  

James,

See the attached notebook for some ideas on how you may want to proceed. Rather than implement a "CloseOnN" to get a particular price, it looks like what you are looking for is the max price in a given timeframe. It would be best to implement a custom factor that does just that - returns the highest high over a given period.

You are making it harder than it is by trying to loop and use if statements. I'd suggest using the built in methods which Python, Numpy, and Pandas provide. The only difficulty there is keeping straight what kind of objects you are working with. Generally, all Python features work with everything. Inside the compute function of custom factors (and classifiers and filters) the objects are Numpy arrays so you can use any of the Numpy methods. Pipelines return a Pandas dataframe so you can use all the cool Pandas methods. One can also cast between Python structures (lists, sets, etc), Numpy arrays, and Pandas dataframes using built in methods if needed.

After looking at the notebook, notice that you can't directly compare pipeline factors to make filters (in the current Quantopian implementation). However, this can easily be done after the pipeline output is fetched. What you may want to ultimately do is make a single custom factor to check for highest high (or maybe a custom filter since it will be a binary output)?

If one would like to show "previous high" then the MaxHigh factor would need to be modified a bit. See @Alex Z comment below. This is helpful if one wants to check if todays high is greater than previous high. If todays data is included in the range, then this will always be true (not good).

Good luck,

Dan

How about 'n' days ahead?

Why would you need 'n' days ahead? How would that benefit you?

The benefit of using data from 'n' days ahead is that it provides a target for training a model.

Ahh... So couldn't you simply use calendar data since data fields from 'n' days ahead would theoretically all be null?

And for previous nth day I assume it should be like this...

class MaxHigh(CustomFactor):  
    inputs = [USEquityPricing.high]  
    window_length = 2  
    def compute(self, today, assets, out, high):  
        out[:] = np.nanmax(high[0:-1], axis=0)