get lagged output of Pipeline Custom Factor?

Hi Grant,

Interesting question. For this factor, I wonder if this would do the trick for N = 5 days ago?

        class AdvancedMomentum(CustomFactor):  
            N = 5 + 1  
            inputs = (USEquityPricing.close, Returns(window_length=126 + (N/2))  
            window_length = 252 + N

            def compute(self, today, assets, out, prices, returns):  
                am = np.divide(  
                (  
                (prices[-21 - N] - prices[-252 - N]) / prices[-252 - N] -  
                prices[-1 - N] - prices[-21 -N]  
                ) / prices[-21 - N],  
                np.nanstd(returns, axis=0)  
                )  
                out[:] = preprocess(am

@Grant. To get the value of a factor n days ago use a simple CustomFactor like this

class Factor_N_Days_Ago(CustomFactor):  
    def compute(self, today, assets, out, input_factor):  
        out[:] = input_factor[0]

Then create an instance of this and pass the factor you want n days ago as the input. Pass the window length which is n+1.

    advanced_momentum_1_day_ago = Factor_N_Days_Ago([advanced_momentum], window_length=days_ago+1)

The one thing you need to ensure is that the original factor is 'window safe'. If it is, then put the following line into it's class definition.

    window_safe = True

See the attached notebook.

Thanks Dan -

The one thing you need to ensure is that the original factor is 'window safe'.

I'm not sure how do check/test for window safeness. I searched the Q help page for the topic, and didn't find anything. I gather from Jamie's post (https://www.quantopian.com/posts/how-to-make-factors-be-the-input-of-customfactor-calculation#58f7921d92b39e5b66f9b473). I guess the idea is that the factor needs to have the same normalization versus time on a per security basis, which I think one would want anyway for an alpha factor, right? I thought all of the Pipeline inputs were corrected for such issues (e.g. splits), anyway?

Also, I don't understand what this does:

window_safe = True

Is it just to ensure that the author did a head-scratch to determine window safeness? Or is it more than that?

Joakim Arvidsson (Cream Mongoose)

'window_safe' is a zipline/pipeline term. I've never come across it in other quant circles. If a factor is 'window_safe' it means its value will be the same when calculated over various 'windows' or timeframes. Really this means its value won't be impacted by stock splits so it's 'safe' to use whether a split is applied or not.

As an example, a '10_day_moving_average_price' factor is not window_safe. If a 2:1 split occurs all the values will be cut in half. However, a '10_day_return' factor is. It's just a ratio which will remain the same even if the prices are halved.

It's up to the author of a factor to determine if a factor is 'window_safe'. The window_safeflag is just used in pipeline to throw an error if it finds it's using a factor in an 'un-safe' way and therefore may be giving incorrect results. Setting this to True simply instructs pipeline to not throw an error. It should be noted that maybe using a factor with unadjusted prices is ok. It just depends upon the situation.

So, yes. It's just to ensure the author did a 'head-scratch'.

Oct 29, 2018

Very helpful, thanks Dan! I've wondered about 'window_safe' for some time too.

Oct 29, 2018

Thanks Dan -

I'm still a bit confused about the need for window safeness. Outside of Pipeline, in an algo that is run in the IDE (the "bactester"), when OHLCV minute bar data are retrieved, they are corrected for splits as of the current simulation time, right? Is this also not true for Pipeline data when running an algo in the IDE?

Or perhaps the code you shared above effectively dials back the current simulation time, so that one has to worry about splits?

Grant, I sometimes find the whole split, window_safe, thing a bit of a head scratcher.

Concrete examples can therefore be a help. Below is a pipeline output for AAPL at the time of their 7:1 split on 6-9-2014. The three columns are price (standard close), price 2 days ago (a typical standard custom factor), and then the price factor used in the n-days-ago factor.

              price    price_2_day_ago   price_factor_2_day_ago  
2014-06-03  628.50000   633.000000        633.00000  
2014-06-04  637.54000   628.500000        628.50000  
2014-06-05  644.82000   637.540000        637.54000  
2014-06-06  647.35000   644.820000        644.82000  
2014-06-09  92.22613    92.480421         647.35000  
2014-06-10  93.70000    92.226130         92.22613  
2014-06-11  94.25000    93.700000         93.70000  
2014-06-12  93.87000    94.250000         94.25000  
2014-06-13  92.26000    93.870000         93.87000

Notice the 'price_2_day_ago' and 'price_factor_2_day_ago' are the same as long as there aren't any splits between n-days-ago and the current simulation day. The 'price_2_day_ago' factor uses adjusted prices, however, 'price_factor_2_day_ago' is the actual (unadjusted) factor value 2 days ago. It's the 'real' value of the factor 2 days ago. This may or may not be what you want. It's not right or wrong. Just understand how to use it. 'window_safe' is just a flag of caution.

Not to beat this to death, but here's an example. If one wanted to know if yesterdays price was greater than the prices 2 days ago, then definitely use the adjusted values (ie column 2 above). However, if one wanted to check if a stock is priced over $100, then adjusted prices aren't correct. Column 3 above would maybe be more correct.

Take a look at the attached notebook. The code shared above for ''Factor_N_Days_Ago" factor works exactly as it implies. It returns the value of a factor as it would have been seen on that day. It doesn't 'adjust' it for any subsequent splits, it's exactly 'as-it-was'. Whether one needs to worry about splits depends upon how it is used (eg the the stock over $100 example above), or more generally, how the factor is calculated and whether it's impacted by splits. If a factor isn't impacted by splits (eg ratios or counts such as returns or up/down days) then the factor can be labeled as 'window_safe'.

oops, forgot to attach the notebook...

Hi Dan -

You are using a notebook, but do things work the same way in the backtester? On the help page, it clearly states:

When your algorithm calls for historical equity price or volume data, it is adjusted for splits, mergers, and dividends as of the current simulation date. In other words, if your algorithm asks for a historical window of prices, and there is a split in the middle of that window, the first part of that window will be adjusted for the split. This adjustment is done so that your algorithm can do meaningful calculations using the values in the window.

So, does this only apply to non-Pipelined data? Or maybe it applies to Pipelined data, but only when running a backtest?

Or perhaps Pipeline, when queried for lagged data/factors, as you illustrated, is effectively shifting the simulation date?

Sorry, still confused what's going on...

Jamie McCorriston

Grant,

One of the redeeming features of the Pipeline API is that uses the same execution engine and has the same user-facing API in both research and backtesting. The only differences between the two environments are the way that you actually run/execute the pipeline, and the shape of the output. Regardless of environment, when a pipeline is computed for day N, it is computed over a set of tabular input data (described as a 2-dimensional M*N matrix in the CustomFactor lesson of the Pipeline Tutorial). Per-share data fields in the input data will always be adjusted as of day N.

In response to the meaning of the simulation date: in backtesting, the 'simulation date' is the current date of the zipline engine. In research, the 'simulation date' is the date in run_pipeline. The simulation date in research is the first level of the index in the output dataframe of run_pipeline.

In general, I find it helpful to play around with an example, like the AAPL split in 2014. You should play around with printing out the inputs to a CustomFactor, the output of run_pipeline, etc. to help visualize how pipeline works.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

So, I guess I'm concluding that Dan's example above effectively shifts the simulation date when computing a lagged Pipeline custom factor, so that one needs to worry about the factor being "window safe" (or it'll suffer from erroneous jumps, if one is expecting adjusted data). If I understand correctly, as of the effective simulation date, the data are adjusted for splits and dividends back to the time of Adam and Eve (or the Big Bang, if you prefer).

It's a bit confusing, since the "simulation date" of a backtest is the current date:

get_datetime(timezone)

Returns the current algorithm time. By default this is set to UTC, and you can pass an optional parameter to change the timezone.

But if one lags a Pipeline factor in a backtest, then the data used by the factor are a trailing window of adjusted data, as of the effective simulation date, and not the get_datetime date (if I'm following correctly...).

Oct 31, 2018

Backing up one step. The ONLY time 'window_safe' (and likewise the majority of the above discussion) is an issue is when using pipeline AND when factors are used as inputs to other factors.

When using any of the built in datasets as inputs, the input data is dutifully adjusted each day to be the data one would have seen on that day. It's really sort of a three step process. First the data is fetched, then it's adjusted, then it is fed to the compute method of any factors.

Adjusting the data is not a trivial process. As an example, prices are divided by the split ratios (ie a 2:1 split will divide the price by 2), however volumes are multiplied by the split ratio (ie a 2:1 split will multiply the volume by 2). Zipline takes care of adjusting the data BEFORE feeding the data into any pipeline factors. The data is always adjusted as of the simulation date.

So far so good...

Now we get to using factors as inputs to other factors. While the built in datasets are dutifully adjusted for splits and dividends, zipline/pipeline doesn't have a clue how to adjust an arbitrary input (such as a custom factor). Does it multiply? Does it divide? Does it do some other funky math? So, it doesn't do anything. (Well I suppose technically it does something. It throws an error if window_safe isn't set to True). In any case, it's really now a two step process. Fetch all the data and send the data to the compute method of the factors. No intermediate adjusting. It doesn't try to 'adjust' it because it doesn't know how to. It just thinks of it as raw data. That's what's happening in the notebook example above.

@Grant, I wouldn't think in terms of 'lagging' a pipeline factor or 'effective simulation' dates. That's complicating it too much. First, one concept which may not be apparent, is that a factor ALWAYS has the same output(s) when run on a specific day. One could run the factor standalone in a notebook or in the IDE or as an input to another factor, and it will ALWAYS output the same value(s) for a specific day. It's like static dictionary. On this date this is the factor output. Period.

Now, when using factors as inputs there are just two steps. First, the factor output is computed. (Remember the output is a fixed series of dates and associated outputs -one date - one output.) Second, this output is passed to the compute method of the other factor. No adjusting is done. ''Factor_N_Days_Ago" is really only taking the fixed output from the input factor (which is a series of dates and the associated output) and 'looking up' the output from 2 days ago. It's not any more complicated than that.

Hope that helps?

Oct 31, 2018

Thanks Dan & Jamie -

I think the key concept I was missing is that, as Dan says, his Factor_N_Days_Ago is looking up the factor output from N days back. I was incorrectly thinking of it as applying the factor to lagged data adjusted up to the current get_datetime() date of the simulation.

Regarding the use of window_safe = True I'm still not completely clear if Pipeline performs a test for window safeness if I don't use it (and thus it is not necessary and is a kind of override), or if it is always required when passing the output of one factor to another, or an error will result?

I searched on the help page for a description of the window_safe flag but found nothing. Generally is there documentation on this business of passing the output of one factor to another? Seems like a basic tool.

Nov 1, 2018

Based on some Google searches, Dan's comment above, and some testing on a factor that outputs Returns (which should be window safe), I conclude that setting window_safe = True flags Pipeline to ignore the NonWindowSafeInput "error" and to keep calm and carry on.

Another question is, when Factor_N_Days_Ago is instantiated with a mask, does it return values using the current value of the mask, or the mask N days ago? For example, since the composition of QTradableStocksUS changes with time, if Factor_N_Days_Ago is looking up prior values of the factor, then the list of securities would change with N, due to changes in QTradableStocksUS. This is what I'd expect, since one is effectively storing and then looking up prior values of the factor.

Dec 1, 2018

Hi Dan & Jamie -

Is there any way to write a function similar to the example above Factor_N_Days_Ago but that returns the factor values for a trailing window of N days? It seems as though Pipeline custom factors are designed to just return a vector, so maybe I would need to iterate over the days_ago parameter to access a trailing window (versus a single lagged value). Or somehow use out.<output_name> for each lagged value, and then cobble everything back together?

Oct 20, 2019

Dan,

I've tried using this version (copied from ipynb) in my strategy, and the simpler version on another post (for getting price 2 days ago), and I cannot get either to build -- I'm getting CustomFactor unknown -- so while it may be the same backtest, etc -- I simply cannot get this code to work in any way I try.

Would be nice if I could simply do:
USEquityPricing.close.iloc[-255]
or .shift(255)

Unfortunately I'm stuck with this -- just need to get it to run : )

Zach

Jamie McCorriston

Oct 21, 2019

Hi Zach,

Thanks for the feedback. You're right that a built-in function to get values from N days ago would be helpful here. I've added a +1 to an internal ticket tracking this feature request.

In the meantime, would you be able to share your implementation and the corresponding error message? Maybe we can help get it running.

Disclaimer

Jamie,

Thanks for the quick reply ! I actually figured out the momentum issue -- unfortunately I was missing an import haha. Seems in the notebook format I missed it, was looking in the wrong cells. I am still stuck, however. I'm trying to translate a deprecated strategy into current build -- the primary issue was the fundamentals (used Morningstar), but I replaced those with the new native filters in universe.
Unfortunately I'm still stuck on the method of building the pipeline -- it was done kind of as a pure DF in the deprecated example, and I've had trouble reconciling with the rest of the logic.

Here's the 'new' version. I'll comment again with the original.

(I also tried simply replacing the deprecated fundamental logic with current style, still not running)

I don't know what happened -- I'm getting some weird indent / formatting error here -- don't understand why my indents are aligned -- just checked them

Maybe some issue with the commented out old code -- I tried sharing my code, but I can't because I can't get a backtest to run. Tried sharing with the collaborate feature?

If that was unsuccessful, here is the link to the thread from the old strategy I'm trying to translate -- and get_fundamentals was the primary issue.
https://www.quantopian.com/posts/value-momentum-strategy

Any help would be greatly appreciated : )

Jamie McCorriston

Hi Zach,

I took a look and it seems the error is coming from this line: temp = context.fund_filt.copy(). Specifically, the issue is that fund_filt is not an attribute stored on context, so the algorithms is raising an exception.

That said, I think your best bet is to re-write the strategy in pipeline. The version that you are working from is using a very old version of the Quantopian API, and can be done much more simply in pipeline. I've attached a version of something that I think is pretty close to the original post you were working from, but with most of the logic moved into pipeline. Before iterating on the strategy any further, I'd highly recommend going through the Getting Started Tutorial and Pipeline Tutorial to get a better understanding of how the Pipeline API works, as it is Quantopian's core API.

Disclaimer

I realized the same thing last night (after we spoke) -- did you see the version since I added make_pipeline ? Regardless I'm stuck on the same thing.

I guess the issue I'm having is I'm still not getting how to use the calc_return function -- I realize the context.output.copy() is the issue, but that's me trying to replicate the original where they're just copying the dataframe. It seems they're simply calculating momentum, which I can do simpler inline like so:
close_1yr_ago = yesterday_close / (Returns(window_length=252) + 1.0)
mom_calc = (yesterday_close - close_1yr_ago) / close_1yr_ago

I understand pipeline, I just didn't understand how to interact with the pipeline with a custom function like this (calc_return) -- but I don't think I need to necessarily. I can simply filter the pipeline with masks or screens, and sort it accordingly -- kind of how I did in the Ver2 I shared first?
It seems Ver2 didn't share -- I'll try again.

I guess my question is why it's performing so differently from the strategy I was replicating? Am I missing something here?

Zach