Using Pipeline with CustomFactor

Back to Community

edited

Greetings,

I created the simple algo below to test my understanding of how the new Pipeline API works. The algorithm attempts to compute Parkinson Volatility on the last window_length of data and rebalance into the equities with the bottom 10 volatility values (lowest volatility). I am attempting to combine this factor with market cap to take the largest 2000 names by market cap first, compute the volatility metric for each of those 2000 and rank from high to low. Of that ranking, I want to only use the names with the bottom 10 volatility values.

A few of my specific questions:

What is the main purpose of the mask argument in the rank method
What is the difference between screening and masking
How to reduce the population to the top market cap then to the bottom volatility?

I can't seem to get my head around combining these factors and I'm hoping the community can give me some pointers.

9 responses

Karen Rubin

Jason,
These are great questions!

1. What is the main purpose of the mask argument in the rank method?
The mask argument lets you mask out certain securities when you are ranking the universe. For example, if you wanted to rank the top 2000 companies by market cap by your parkinson_vol_factor, you set mask=top_2000 when calling rank. This removes all other securities in the universe from your rank, and limits the rank to the top 2000 companies. It does not remove any securities from your universe.

2. What is the difference between screening and masking?
Screening removes securities from the output of your pipeline. In your example, if you set_screen(top_2000), the resulting output will ONLY include the top 2000 companies by market cap.

Masking is used in rank, top, bottom and percentile_between to limit the number of securities those functions are applied to. It doesn't remove anything from your output.

3. How to reduce the population to the top market cap then to the bottom volatility?

Take a look at the attached backtest. I used rank with a mask to get the top_2000 companies ranked by the parkinson_vol_factor. Then I set a screen to the top_2000 to remove everything else from the output.

Let me know if you have any questions.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Jason

Great explanation (as usual). Clear to me now, thanks.

Shiv Chawla

Is it possible to write a custom function with masking capabilities? For ex: If I want to calculate the cross-sectional mean using only the masked securities. Thanks!

Karen Rubin

Hi Shiv,
It is now possible to write a custom factor with masking.

Disclaimer

Matthew Field

@Karen - did the way that screen vs mask work change since this was written?
As far as I can see, the screen is being applied at the beginning of the pipeline, not the end, which isn't how I read your statement 2 above "Screening removes securities from the output of your pipeline".

If I set up a screen (based on your mean reversion example) with something like:

def initialize(context):  
    pipe = Pipeline()  
    pipe = attach_pipeline(pipe, name='factors')  
    dollar_volume_1 = AverageDollarVolume(window_length=1)  
    pipe.add(dollar_volume_1, 'dollar_volume')  
    high_dollar_volume = dollar_volume_1.percentile_between(95, 100)  
    recent_returns = Returns(window_length=returns_lookback)  
    pipe.add(recent_returns, 'recent_returns')  
    pipe.set_screen(high_dollar_volume)

and then do something like this:

def before_trading_start(context, data):  
    results = pipeline_output('factors').dropna()  
    ranks = results['recent_returns'].rank().order()  
    log.info(len(ranks))

Then len(ranks) was about 430, if I comment out the pipe.set_screen it's over 7000. If the screen was being applied at the end, I'd think it should make no difference?

I can see definite advantages to doing the screen at the start of the pipeline - it saves passing the screen with a mask= parameter to each and every factor, but it seems counter to the examples and the docs (which seem to say "The other important operation supported by pipelines is screening out unwanted rows our final results."). I can also imagine wanting the screen to be at the end ... but mostly just want to be clear exactly how it works.

Eric Bell

Why do you choose volatility? I have been experimenting with your code. I have tried both high and low volatility. I know with Low IV options are cheaper, but that doesn't matter with stocks. If anyone can give me a theory in layman's Terms behind this algorithm, that would be great.

This is what Quantdog has so far. She's purring just a bit, but no alpha, and no bounce in recession if you run the backtest from 2008 to Present.

Jason

There's a difference between IV (implied volatility) and historic or statistical volatility. IV is the volatility input into a options pricing model that matches the model price with the observed market price. As you mention it is strictly related to options pricing. A historical or statistical volatility metric (which I am studying in this algo) is the historic volatility of a stock and is measured in many different ways (most simple as the standard deviation of log returns).

My intent here was to basically fade stock volatility. For example, if there is a period of high volatility (which usually means stocks have decreased), buy them. If there is a period of low volatility (which usually means stocks have increased), sell them.

Historical volatility is used as an input into option pricing models often as a way to get a baseline price before calibration (which is an entirely different topic altogether). So in short, historic volatility (based on stock prices) is different than implied volatility (based on options models).

Eric Bell

Thanks Jason, I was a little confused, but it makes sense as a std dev. Overthinking! I was a little confused by you choose the bottom 10, which is the lowest in volatility.
Does this mean the stocks, according to your hypothesis, have already increased in price already?

Jason

My intent was to buy the highest volatility stocks anticipating a reversal. Note this is not just sophisticated just an experiment. With this line of code:

context.long_list = ranked_2000.sort(['parkinson_vol_factor_rank'], ascending=False).iloc[:10]

I sort the ranked stocks in descending order and take the top 10. In other words, the stocks with the highest ranked volatility.

You've successfully submitted a support ticket.

Our support team will be in touch soon.