Multiple Pipelines Available In Algorithms

Back to Community

posted

As you might've noticed in the new risk API announcement, we've added the ability to use multiple pipelines in algorithms. One example of that is to use the risk loading pipeline along with another pipeline that you define, as seen in the attached algorithm.

Multiple pipelines can easily lead to a slowdown in your algorithm, because the pipeline machinery can optimize your data fetching within a single pipeline, but does not optimize data fetching across separate pipelines. In general, it's better to use a single pipeline. Some anti-patterns are putting each of your terms into its own pipeline, or having shared terms across multiple pipelines.

However, there are a few select use cases where multiple pipelines do work, like when you have disjoint sets of computations that you'd like to run and think about differently (if, for example, you have one pipeline for your risk loadings, and another pipeline for your alpha factors).

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

17 responses

Doug Baldwin

Hi @Abhijeet,

This is potentially a huge improvement. Can we have a Pipeline that executes only once when initialized and for the first day only? This would be very helpful for CustomFactors with multi-year window_size, and only need to be run once. Further, it would be nice to use schedule_function to schedule a Pipeline run for those same long-running CustomFactors, so the factor could be updated weekly, or monthly.

Best Regards,
Doug

Doug Baldwin

** crickets **

Jamie McCorriston

Doug, what type of long-running factor are you looking to run? A lot of the time, the bottleneck on speed is in loading the data into pipeline. Pipeline is efficient in that it only loads data points once, even if they're needed on multiple days. That means if you have a factor with 2 years of lookback that you run once per month, pipeline will still only load those 2 years of data once, so computing the pipeline less frequently won't actually save time. Now, if your computation is the bottleneck, then you can downsample your term. For example, you can do something like

my_factor = MyCustomFactor().downsample('month_start')

Does this help?

Disclaimer

Jamie McCorriston

Here's a simple example of the close price of AAPL being downsampled.

Disclaimer

Doug Baldwin

Hi @Jamie,

One representative long-running CustomFactor pulls 15 years of Fundamentals data, digests the data, and generates 365 values for each asset. I can easily do this in a Research Notebook, output a CSV, and local_csv() into an algo. Contest rules however prohibit local_csv().

I've tested this CustomFactor in the algo, generating the 1 value needed for each asset each day, but the algo times out after the first 10 or 20 days of backtesting. And besides, I have no need to repeat this calculation daily. Weekly or monthly perhaps would be useful but not absolutely necessary.

So no, the suggestion you offer, while appreciated, is unhelpful.

Best Regards,
Doug

Ann

This is super useful, thank you! I have code in some algorithms that uses calculations on some ETF's to determine market conditions, and I don't want the ETF's to be in my universe. Now it's easy to put my universe into one pipeline, and put the market condition calculations into a separate pipeline. It looks like this is what you're doing with your risk pipeline.

I was able to do everything I needed to do in a single pipeline, but this makes the code much simpler.

Doug Baldwin

Hi @Jamie,

Does Q agree that scheduling an alternative Pipeline is useful and has merit, either on initiation only or periodically? Is this feature request assessed to be easily implemented?

Thank you,
Doug

Jamie McCorriston

Hi Doug,

Can you provide a bit more information on what you're trying to do? If I had to guess, it sounds like you're building up training data for some sort of model (based on the 15 years of data).

In general, it's not an easy task to schedule a pipeline to run on certain days only. That said, I'm still not sure I understand why the pipeline is timing out. Do you know roughly how long the pipeline takes to compute in research? Without knowing more details about the code, it's tough for me to pinpoint the issue. Based on the fact that your backtest was able to run for the first 10-20 days, my hunch is that it's the computation that's expensive, and I'm wondering if there's a way to make it more efficient so that it can run in a backtest. Would you be willing to share your code either here, or privately with our support team (Help -> Contact Support)?

Disclaimer

Doug Baldwin

Hi Jamie,

Thank you. If I understand correctly, pulling 15 years of the same data on each cycle is not a resource constraint. I'll rewrite an efficient daily calculation instead of processing all 365 days in one pass as I had done in research.

Best Regards,
Doug

Elke Weiss

Jamie thanks a lot.

Mustafa Tambawalla

Thank you for this!

Thanh Duong

Hi Jamie,

I am trying to create two monthly pipelines. One pipeline calls Citi Bank's EV/EBITDA, market cap, and industry. Then I will use that info to create another pipeline that filters out every stock in the same industry with market caps that are lower or higher than 10% of Citi's market cap. However, I don't know how to do this. Would you mind suggesting how to approach this?

Thanks,

Thanh

Dan Whitnable

Best to use a single pipeline with a factor for the Citibank EV/EBITDA. Then base a filter off that. Not sure why you would want two separate pipelines. Keep it simple. (As a benefit, one pipeline will typically run faster too)

Thanh Duong

Hi Dan, I tried to get all the info of Citibank by using these codes:
def make_pipeline(context):
symbol = Fundamentals.primary_symbol.latest
symbol_filter = symbol.eq('C')

market_cap = Fundamentals.market_cap.latest  

industry = Fundamentals.morningstar_industry_code.latest  

EV_EBITDA = Fundamentals.ev_to_ebitda.latest

pipe = Pipeline(  
  columns={  
      'EV/EBITDA': EV_EBITDA,  
      'market_cap': market_cap,  
      'industry': industry  
   }, screen = symbol_filter)  
return pipe

I try to find a way to call Citibank EV/EBITDA as a factor by just could not. Can you suggest a way to do that? Thank you very much!

Stephen Hanly

Abhijeet and Jamie,

I have a few contest entries that are using the risk pipeline and have been working fine in backtests and in the contest. I started them as live trading with paper money to be able to periodically check on their performance. I've found previously that I learn a lot more when live trading an algorithm (especially when it's my own money but live trading support was ended...). Unfortunately, I received the following error on February 21st.

ValueError: Request for risk model data ending with 2018-02-21 could not be processed. Data is available up to 2018-02-16.  
There was a runtime error on line 230.

Is this a known issue that the risk pipeline can only be used in backtesting? I'll try to make live versions again and see what happens. But as I mentioned, I learn a lot more by looking at the semi-real time output of an algorithm than from a backtest. So I would think it would be in Q's best interest to allow developers access to live trading their contest entries (with paper money).

Abhijeet Kalyan

Hi @Stephen,

This is a current known limitation, where live servers launch before the risk model data is available. I've made a note of it in our bug tracker, and it's on our list to get to.

Disclaimer

Joakim Arvidsson (Cream Mongoose)

Hi Everyone,

Thank you for implementing this (and for everything else on Q - truly grateful!). I'm quite a newbie both with Q and Python and would really appreciate any help with my below two struggles:

Is it possible to plot data from the Risk Pipeline in the Custom Data graph? If possible I would like to plot rolling Alpha, Beta, Sharpe, and Volatility. If it is, would you be able to provide some sample code, or point me to to the relevant lesson or lecture please?
I'm trying to write an algo that filters out stocks based on Fundamental data (fundamentals_pipe), then to buy/sell AND HOLD those positions based ONLY on Technical indicators (technical_pipe), even if they are no longer part of the fundamentals_pipe. Essentially a Value + Momentum strategy.

For example, filter out high and low p/e stocks, then buy and hold the low p/e stocks that are trading above a certain moving average (e.g. SMA50), even if they are no longer part of the 'low p/e stocks' filter.

What's the most efficient way to implement this? Using two pipes as described above, or using a 'for' or 'while' loop in my Rebalance function? Any sample code would really be appreciated. I'm happy to share my current code I'm struggling with. I'm getting a lot of time-outs, likely because my code is not very efficient.

All the best,
Joakim

You've successfully submitted a support ticket.

Our support team will be in touch soon.