Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
limit pipeline factor to specific list of stocks?

Is it possible to limit a pipeline "alpha" factor to a specific list of stocks? If so, please provide an example. Thanks, Grant

8 responses

On https://www.quantopian.com/posts/using-a-specific-list-of-securities-in-pipeline , Dan Whitable provides the example:

from quantopian.pipeline.filters import  StaticAssets

aapl = StaticAssets(symbols('AAPL'))  
aapl_ibm = StaticAssets(symbols('AAPL', 'IBM'))  

So, could this be used on a factor-by-factor basis, so that the factors could then be combined? For example, say I had:

spy = StaticAssets(symbols('SPY'))  
qqq =  StaticAssets(symbols('QQQ'))  

Could I then define custom factors, and limit their output to the ETFs, e.g.:

spy_factor  
qqq_factor  

And then be able to combine them with other factors, e.g. ones that are applied to Q1500US()?

Also, is there any way to select specific securities within a custom factor? For example, say that I wanted to define a factor that provided provided a signal for SPY based on a limited set of securities? Is this possible?

Grant,

You asked "is there any way to select specific securities within a custom factor". Technically, yes. I've attached a notebook with a custom factor and filter which base an asset's output on other assets. Really anything that can coded can be done in factor. The limitation is inputs can only be the datasets (so not other factor outputs). There is even access to the current or backtest date ('today' in the compute function parameters). One could output different values based upon the day of the week for instance.

Me personally, I have started using the pipeline strictly as a data source however. The pipeline simply returns a big dataframe with columns for all the data I need to make algorithm decisions. I even have started to return the values of filters as True/False columns. All logic I put in code which then references this pipeline dataframe. I have found it cleaner to separate the data (defined by the pipeline) from the logic (defined in my code). One thing this does is circumvent the limitation that factors cannot reference other factors.

Thanks Dan -

I'll dig into when I get the chance, and see if I can post a realistic example. To date, I've been using pipeline only to filter lists of stocks, by a few simple criteria (e.g. market cap, exchange, Q1500US(), etc.) with all of the decision making within the old-school part of the algo. I'm gonna finally take the plunge and attempt to write pipeline factors, that can plug into the workflow that is under development.

Grant

Hi Dan,

Here's an example. If I understand correctly, this limits the factor computation to stocks within the Q500US:

optrev = OptRev(mask=Q500US())  

Then, the NaNs are filtered out for the stocks that are not in the Q500US with the screen:

my_pipe = Pipeline(  
        columns={  
            'optrev' : optrev,  
        }, screen = Q500US()  
    )  

Exactly. I was surprised how a mask actually works. In the compute function above both 'assets' and 'close' will only contain entries for securities passing the mask. I thought that maybe the values would be NaN or something, but no. The 'close' array in this case has only 500 columns and not the 'un-masked' 8000 or so. I believe that translates to the 'm' in your factor. A big computation savings especially in your example.

One trick I've done to ensure a pipe returns only the results from a factor AND also no NaNs is something like this:

my_pipe = Pipeline(  
        columns={  
            'optrev' : optrev,  
        }, screen = optrev.notnan()  
    )

Thanks Dan -

In the end, I'd like to sort out how to apply factors 1 to N to universe A, and factors N+1 to M to universe B, etc., for example, and then combine all of the factors. It would be an extension of the workflow (e.g. so that one could deal with stocks with one set of factors, and ETFS with another, for example).

Grant, I do like where you're heading (if I understand correctly). Use different factors on different subsets of securities. As you noted, the obvious subsets are stocks and ETFs etc. One potential benefit of using custom factors in this way is code re-use. As in below:

optrev_stocks = OptRev(mask=Q500US())  
optrev_etfs = OptRev(mask=my_etf_filter) 

Use the same optimizing engine but on a different universe of securities.

What I've been interested in is perhaps an extension of this. I see most algorithms looking for ONE successful factor, or combination of factors, to predict returns or some other portfolio goal. The key is this factor needs a relatively high information value (ability to predict) AND also needs to generate enough signals to be efficient. A factor that is correct 55% of the time and generates a signal every day will probably do better than one that is 80% correct but only generates a signal once a quarter. This later factor is often disregarded because it doesn't backtest well.

What I've been looking at is a workflow that allows for combining a number of 'sparse' factors into a coherent strategy. As a simple example, combine factors such as 'trade before earnings announcement' plus 'trade on huge twitter spike' plus 'trade before a holiday'. None of those individually will happen very often but, when they do, they are pretty strong predictors. Combining many small factors such as these could generate enough 'signals' to actually trade while maintaining enough diversification to limit exposure. Additionally, the many 'un-related' factors potentially create a more robust algo by not relying on a single set of conditions.

Yeah, I'm a little perplexed by the generality of the long-short workflow. It seems that one would want niche, odd-ball, never-thought-of-that factors, which almost by definition are not gonna work across a single vast generic universe of stocks. Of course, there's a loss of diversification, risk of over-fitting, less scalability, etc., but I have to wonder how many factors are out there that would work on the broad Q500US/Q1500US universes, for example?