Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Creating a Factor using Historical Data

Hi,

I notice that all custom and builtin factors are setup in the pipeline using initialize function of the algorithm. My understanding is that the pipline output is updated every day and the factors correspond to the appropriate run date.

I would like to create a factor using historical price data. The price data can be accessed only through the 'data' object and it is not available at the time of initialize and the pipeline creation time. What are my options? How do I go about creating a factor based on historical data (Example 6 month returns as of 3 months ago).

4 responses

Is this what you are looking for?

from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline import Pipeline, filters  
# ---------------------------------------------------------------  
STK_SET, MOM, AGO, N = filters.Q500US(), 126, 63, 10  
# ---------------------------------------------------------------  
def initialize(context):  
    schedule_function(trade, date_rules.every_day(), time_rules.market_open(minutes = 65))  
    attach_pipeline(Pipeline(screen = STK_SET), 'pipe')

def trade(context, data):  
    stocks = pipeline_output('pipe').index  
    C = data.history(stocks, 'price', MOM + AGO + 1, '1d')  
    R = C.iloc[-AGO-1] / C.iloc[0] - 1.0  
    R = R.dropna()  
    R.sort_values(ascending = False, inplace = True)  
    long_secs = R.tail(N)  
    short_secs = R.head(N)

    print(long_secs, short_secs)  

That works which is kind of what I did. What I really wanted was something I could integrate into the factors and the pipeline. I want to be able to define a Custom factor say "SixMonthReturn".

Example:

base_universe = QTradableStocksUS()
book_to_market = Fundamentals.book_value_per_share.latest
sixmonthreturn = SixMonthReturn(inputs=[data], window_length=360,...)

fil = (book_to_market >= 5) & (sixmonthreturn > .10)

return Pipeline( columns = {'btm' : book_to_market,
'6mret' : sixmonthreturn
},
screen = fil )

Generally any pipelines are defined in the 'initialize' method. This simply sets up the columns one wants in the returned dataframe when the pipeline is run. It's typically done in the 'initialize' method because it only needs to be done once and before the pipeline is actually run. The key point is that this simply defines the pipeline.

The statement "The price data can be accessed only through the 'data' object" isn't entirely true. The same price data (and more) can be accessed through pipelines. The exception is minute and current days data. Pipelines only have access to daily data and the latest available data is the previous trading day. So, unless one needs minute data and/or current day data, then it's all available using pipelines.

Below is a simple custom factor to get previous values of an input factor. The window length determines how many trading days to 'look back'.

# Create a custom factor which takes as an input another factor  
# The output will be that factors value n days ago  
# n is set by setting the window length equal to n+1

class Factor_N_Days_Ago(CustomFactor):  
    def compute(self, today, assets, out, input_factor):  
        out[:] = input_factor[0]

To get the desired '6 month returns' but from 3 months ago one could do something like this.

# Create our base factor. In this case 6 month returns (126 trading days is about 6 months)  
returns_6_month = Returns(window_length=126, mask=base_universe)

# This is what our returns factor was  3 months ago (63 trading days is about 3 months)  
days_ago = 63  
returns_6_month_3_months_ago = Factor_N_Days_Ago([returns_6_month], window_length=days_ago+1, mask=base_universe)

See the attached notebook. Good luck.

Thanks, I did not realize there was a thing called Returns that was accessible from the pipeline. Solves my problem this.