How to speed up pipeline calculations

Back to Community

posted

Hello,

In the interest of getting up to speed with the new pipeline API I've attempted to implement my own version of the Piotroski score which I've attached. I've wrapped the CustomFactor around some code that rebalances yearly, and I believe the code to get the pipeline output 'output = pipeline_output('piotroski')' is only being called once. However, based on the logging statements in the compute method of the custom factor I can see that compute is being called many times, which I believe is causing a significant slowdown when executing the algorithm. Could someone please assist with this. Is there a way to speed this algorithm up?

Regards,
Mark

6 responses

Scott Sanderson

@Mark

Your compute function is always run for every day of your backtest. The idea here is that each day you receive a trailing window of data ending on that day, and the responsibility of your compute is to produce one value per asset each time it's called.

Looking at your compute, I doubt that's a significant bottleneck for performance. The more likely issue is that you're loading 200 trailing rows of fundamentals data for every field you're using, but you only need a single row for many of them. I'd suggest trying to pull out the expressions that need long trailing windows into separate factors, computing that score separately, and then adding them together.

In the longer term, your example makes me think that we need a way to take a Filter and convert it back into a Factor by setting True to 1 and False to 0. This would let you write something like:

piotroski_net_income = (morningstar.income_statement.net_income.latest > 0).as_factor()  
piotroski_cash_flow = (morningstar.cash_flow_statement.operating_cash_flow.latest > 0).as_factor()  
...

piotroski_score = (piotroski_net_income + piotroski_cash_flow + ...)

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Scott Sanderson

Separate note: I think you have the date convention for the trailing windows backwards. The first row is always the oldest row, and the last row is always the newest. This means, for example, that "roa this year - roa last year" should be roa[-1] - roa[0], which is the inverse of what you have listed I think.

Disclaimer

Scott Sanderson

I put together a short notebook that shows how I'd go about computing Piotrosky Score. It's still a fair bit slower than I'd like it to be, but the bottleneck at this point is almost entirely network IO with our database.

The bad news here is that there isn't much that you can do as a user to speed this up.
The good news is that this should get faster for free over time as we figure out how to optimize the fundamentals database for pipeline usage patterns.

Hope this is helpful,
-Scott

Disclaimer

Mark Segal

Hi Scott,

thanks a lot for taking a look at this. Your comments are very helpful. I've updated my implementation with your code if anyone is interested.

Regards,
Mark

Phil Hoffer

Hi Mark,

It could be my confusion however I'm wondering if the comparisons to the previous years' values, e.g. ROA has a bug?

The code defines ...

OLDEST = -1  
NEWEST = 0

and then awards a point to the Piotroski score if the Return on Assets (ROA) now is greater than ROA a year ago.
However for the code awards a point if ROA a year ago (index 0) is greater than current ROA (-1)

        out[:] = (roa[NEWEST] > roa[OLDEST])

Or am I confused about the indexing of the 'roa' array?

Thanks in advance
Phil

Mark Segal

Hi Phil.

thanks for pointing this out. the code above is taken from Scott's notebook. I think you are correct there is a discrepancy, Scott notes that the convention is the newest row is the last row and the oldest is the first so I believe the code should be,

OLDEST = 0  
NEWEST = -1

Certainly making this change improves the backtest.

You've successfully submitted a support ticket.

Our support team will be in touch soon.