Multiple pipelines?

Back to Community

Multiple pipelines?

posted Aug 10, 2016

Is there a reason why multiple pipelines are not allowed? I sometimes need assets from totally different classes and it would be much nicer to have separate pipelines for finding them instead of just using one gigantic pipeline and splitting the results.

14 responses

Jamie McCorriston

Aug 10, 2016

Hi Mikko,

There's no reason why this doesn't exist other than the fact that we haven't implemented this feature yet. I'll pass your request along internally.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hayden McMaster

Sep 28, 2016

Hi Jamie,

I second this one. Multiple pipelines would be awesome! I find sometimes if I'm trying to grab several specific things I have to get large dumps of data in one pipeline and then try to chop it up in the algorithm.

Cheers,
Hayden

Matthew Scarborough

Apr 12, 2017

Jamie,

I third this notion. In the meantime, is there a way to rerun the original pipeline with a new screen? That would almost alleviate my need for a second pipeline.

-Matthew

Jamie McCorriston

Apr 14, 2017

Hi Matthew,

There isn't a way to re-run the pipeline. That being said, I would expect you should be able to just add another filter to your screen. Can you give me an example case where you would want to re-run the pipeline?

Disclaimer

Matthew Scarborough

Apr 14, 2017

Hi Jamie,

I no longer have that particular algorithm in a form where I need to re-run the pipeline, but my goal was to be able to create a pipeline output and store the equities into a list or dictionary, and then re-run those through some of my Custom Pipeline factors again at later dates to decide if I still wanted them in my portfolio.

Jamie McCorriston

Apr 14, 2017

Hi Matthew,

For that type of analysis, I would recommend passing a mask to your second round of factors in the same pipeline. You could then put the outputs of those factors as columns in your pipeline. This is a little bit tough to describe without a more concrete example, but the idea is that you can apply Filters at multiple points in a pipeline by applying them as masks. The screen is just the final one that gets applied to the Pipeline. A good example of this is shared in the final lesson of the Pipeline Tutorial.

Disclaimer

Hayden McMaster

Apr 19, 2017

Hi Jamie,

I can't remember the specifics but I had quite a few scenarios where I wanted to get several dataframes, each with fairly specific data. Getting frame of data in pipeline was easy by filtering and masking everything else out. I thought if I could run multiple pipelines I could quickly and easily get what I needed.

As it is with one pipleine, to get what I wanted by I had to try to get masses of unnecessary data and then try to filter it in the algorithm which was slow and always crashed on memory.

Cheers,
Hayden

Dan Turner

May 22, 2017

Hi Jamie,

I'm 4thing this. I have a few algos that are getting pretty complex and hacky because I'm working around only having one pipeline.

Bing Wong

May 23, 2017

I'd also like to support the ability to use multiple pipelines.

Dan Whitnable

May 23, 2017

One thing to consider is the original purpose for the pipeline construct. It's primary job is to reduce the data fetch time by minimizing the number of redundant database calls. If one were to simply create multiple instances of the pipeline object, this could also add multiple database fetches (which pipeline tries to optimize).

One approach is to create a single pipeline definition having all the factors and classifiers for all pipelines. Add the screens for each pipeline as columns but NOT as a screen (ie return all rows/securities). Then simply create separate dataframes for each separate pipeline. This has the net effect of having a single pipeline (for optimization) but ends up with separate filtered dataframes to use as outputs. Maybe code it something like this...

def make_pipeline():  
    """  
    Create factors and classifiers for all pipes.  
    Add the final pipe screen as a column to the definition. Don't set a screen.  
    """  
    mean_close_200 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=200)  
    latest_close = USEquityPricing.close.latest  
    dollar_volume = AverageDollarVolume(window_length=30)

    rsi= RSI()  
    a_filter = rsi < 30  
    b_filter = rsi > 70

    return Pipeline(  
            columns={  
                'mean_close_200': mean_close_200,  
                'latest_close': latest_close,  
                'rsi': rsi,  
                'a_filter': a_filter,  
                'b_filter': b_filter,  
            },  
             )

def before_trading_start(context, data):  
    """  
    Called every day before market open.  
    """  
    # list the columns you want for 'pipe a' and include the filter  
    output_pipe_a = (pipeline_output('my_pipeline')  
                        [['200_day_mean_close', 'rsi', 'a_filter']]  
                        .copy()  
                        .query('a_filter'))

    # list the columns you want for 'pipe b' and include the filter  
    output_pipe_b = (pipeline_output('my_pipeline')  
                        [['latest_close_price', 'rsi',  'b_filter']]  
                        .copy()  
                        .query('b_filter'))

Hayden McMaster

May 25, 2017

Hi Dan,

That works in some situations but if you want to be quite specific about a basket of stocks to select and the criteria you are selecting by uses longer period lookbacks and / or many columns it doesn't work. Trying to put all that data into one big pipeline with no screens quickly crashes the algorithm with memory limitations. Most of the data being returned is useless to the algorithm. It's only there for the fact that you can't use a screen so you have to process things post pipeline.

Having multiple pipelines would allow a long and short leg to be screened out separately at the pipeline stage and only fetch / return a very selective amount of data from the database for each leg. I imagine doing 2 pipeline calls per algorithm in this way (up from the current limitation of 1) would be much more efficient than trying to grab all data you need for all rows in the database, trying to cram it all in one big pipeline and then post processing in Pandas.

Cheers,
Hayden

Jamie McCorriston

May 25, 2017

Hayden,

Dan's explanation is a good one. When the algorithm runs out of memory, it's because it requested too much data from the database. Adding a second pipeline won't solve that problem. In fact, it would be less efficient in its database requests and be more likely to consume more memory. I understand that being able to have multiple filters would be handy, but I think you can recreate this functionality by adding your filters as columns to your pipeline as Dan mentioned. This is actually what's done at the end of the pipeline tutorial. The long and short filters are added as columns to the pipeline and are used to filter the result in before_trading_start. The filter that is passed to the pipeline is the one that is common to both legs (alternatively, no filter could be passed here).

When data is requested from the database for a filtered universe, the data for all 8000+ stocks is pulled in before any filters are applied. Because of this, creating a smaller universe before reading from the database will not help with memory issues. Masking may help with speed on large computations, but also won't help with memory.

All this said, we definitely understand that there are memory and timeout issues that come up on more data-intensive pipelines. We are in the process of solving some of these problems. One of our currently active major projects is a rework of how fundamental data is delivered to pipelines. We're expecting this project to cut down the frequency of some of these problems.

Disclaimer

Hayden McMaster

May 27, 2017

Hi Jamie,

Thanks for clarifying that. It seems I (wrongly) assumed that pipeline would filter during the query to the backend database and therefore only retrieve the filtered subset of data in to the algorithm. If that was the case then running 2 or more pipelines to return small subsets of data would be effective.

Given that pipeline actually loads all the data into the algorithm and filters there I understand Dan's / your point. Sounds like there's not much we can do to get around memory issues at this point until you've finished some of your projects.

Thanks again for clarifying.

Cheers,
Hayden

Abhijeet Kalyan

Dec 12, 2017

We've now added support for multiple pipelines in algorithms: https://www.quantopian.com/posts/multiple-pipelines-available-in-algorithms

Disclaimer

You've successfully submitted a support ticket.

Our support team will be in touch soon.