Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Fundamentals fundamentally broken

Quantopian at present does not appear to offer reasonable access to past values for fundamental data that spans a couple of years. I keep getting timeout errors as I'm trying to implement the Quantitative Value algorithm. On the verge of giving up.

There are a number of possible fixes for my situation:

  1. A date argument for get_fundamentals.
  2. More efficient access of data points via pipeline. I don't need all data between t0 and t1 to get the values of a factor at t0 and t1.
  3. Support for multiple pipelines in an algorithm, so that I might at least spread the algorithm's execution over multiple days.
  4. Generally faster access to fundamental data, right now this is abysmally slow.
  5. An extended initialization period for an algorithm, so that I may cache some of the values I need for algorithm warm up, such that I can then roll in new values as the algorithm continues to execute. This will of course take up quite a bit of memory in the context object, but will let an algorithm to start executing in live/paper trading right away.

If anyone might be able to offer suggestions, I would love to be able to move forward.

Sunil

17 responses

Hi Sunil,

I understand where you're coming from. Right now, loading long windows of fundamental data in pipeline is slow. In addition, even if you want only the values for t0 and t1, you have to get the entire window of data. Improving fundamental data load speed is on our to-do list as is the ability to specify dates for input data to a pipeline factor. Right now, I'm afraid there's no good workaround for making extended lookbacks on fundamental data, it's simply a limitation that is on our list to fix.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Jamie, thanks for the quick response.

Could we at least get a date argument for get_fundamentals? That should not be too hard.

Sunil

Hi Sunil,

Unfortunately we're not planning on adding any functionality to get_fundamentals. Instead, the plan is to add the date lookup to pipeline factors (at least a relative date lookup). The longterm plan is for fundamentals data to solely exist in pipeline. Sorry for the inconvenience.

Hi Jamie,

I understand where you're coming from. You don't want to build up an API that you think isn't very good, which in this case is get_fundamentals. I don't think you fully appreciate what I hear when you say this, I hope you don't take this the wrong way. Unless you wish to continue, this will be the last message I post on the subject.

What I see here is a promise to do something great at some point in the future with the pipelines API that will solve my problems. Only this is a big task, and you don't know when it will get prioritized. In the meantime, there's what seems to be a relatively small change possible right now that you will not make, because it is extending an API that Quantopian has decided isn't good. This leaves me unable to proceed, with no reason to believe that Quantopian will deliver a platform where it is possible to build sophisticated value based strategies.

Unfortunately I am not convinced of the ultimate utility of the pipeline API thus far. There are some nice things you can do with the API relating to computing factors over variable windows for a few factors. Once you start building up an algorithm involving a couple of dozen factors, there isn't any obvious way to understand the performance and behavior of the pipeline. The API doesn't support any form of profiling. DSLs are great in so far as they allow more succinct expression of solutions to problems in the domain. However, they can lead to code that is very inefficient, if the execution model behind the DSL isn't transparent. get_fundamentals on the other hand results in a program that is easy to understand, evaluate, and optimize.

I hope Quantopian has considered these tradeoffs in the product decisions that have thus far been made. I know you guys have a small team, and for that you've delivered something quite amazing. Hopefully that will continue to be the case.

Sunil

It's really a pity, that there is no date lookup in get_fundamentals . I hope you will change your mind and decide to add it. The point outlined by Sunil is very reasonable and I totally agree with his last comment.

The first time I've complained about the lack of an efficient way to access historical fundamental data was in this post:
https://www.quantopian.com/posts/period-ending-date-and-historical-fundamental-data-quarterly-slash-yearly
dated Jan 16, 2015 where you told the feature is "*definitely on our roadmap.*"

Don't get me wrong... I appreciate a lot your work and it would be great if you could update us a little bit more about your roadmap or the upcoming features... a lot of algos of mine relies on get_fundamentals and now hearing it will be deprecated it's quite boring for me.

Hey guys,

Unfortunately adding a date field to get_fundamentals isn't all that trivial. There's a lot that would need to go in to such a feature. Instead, our efforts will be focused on improving the load time of fundamentals data and eventually allowing a way to specify dates (or relative dates, at least) in a pipeline factor.

Regarding get_fundamentals vs pipeline, I want to point out that pipeline is strictly greater than get_fundamentals in functionality. There's nothing that get_fundamentals can do which pipeline cannot. Conversely, pipeline has functionality that get_fundamentals does not (e.g. lookback windows). Instead of having two ways to get fundamental data, each having their own pros and cons, we're focusing our efforts on pipeline.

Sunil, with regards to execution, I'm afraid I have to disagree. On a factor level, you can profile using a factor tear sheet. On an entire pipeline level, you can work in the research environment to perform an analysis. One of the nice features of the Pipeline API is that it is the exact same in research and the IDE. That means you can directly copy and paste pipeline construction from one environment to the other. All you have to do in research is use run_pipeline to specify dates over which you want to run it.

For execution efficiency, when a pipeline is run, there's a lot done under the hood to make things run as efficiently as possible. If you're interested in seeing how things are done, you can always have a look at the code in the zipline repo. An easier way to get a look into what's going on behind the scenes is with Pipeline.show_graph() (example here). Showing your pipeline graph will give you a sense of what the execution will look like.

Hi Jamie,

Once again, thanks for engaging, it's good to feel like we're being heard. The factor tear sheets and pipeline graphs look interesting. I'll try them out. The link to the zipline repo you have above leads me to a 404, which presumably means I need specific authorization to view that content.

Can you offer us a timeline when we might expect to see some improvements in dealing with fundamental data?

Sunil

Hi Sunil,

My apologies, I fixed the zipline link above. For convenience, here it is again: https://github.com/quantopian/zipline.

Unfortunately, I don't have a timeline for you. My best guess is on the order of a few months, but we're not yet sure what will be involved so there's no upper bound yet. It has certainly moved up the list sine January (when we were focusing on Quantopian 2) but I just don't have a good sense of a timeframe yet. Sorry about that.

Jamie,

Just to give you a sense of the types of issues I'm running into with pipelines, please see the attached notebook. This is just one factor, my algorithm involves combining about a dozen such factors. Please run this pipeline to get a sense for how much time just one of these factors takes to get a more concrete sense for why working with fundamental factors is so frustrating. If you have suggestions for improving performance, I would love to have them.

Also note that there are some things that I can't quite figure out how to do with factors. For instance, how do you take the min of the values within a factor? I've tried to use the function min in the second pipeline in the notebook, and that doesn't work. Do I have to write another factor to take the min of two factors? I would never have figured out how to do NaN value filtering on my own, I got a hint from the pipeline graphing example.

Even with the graph, I don't have a sense for the steps that are the most computationally intensive. It would be good to have some numbers attached to each of those boxes in the pipeline graph.

Sunil

Hi Sunil,

I went through your notebook and I can see why you're frustrated by the speed. Unfortunately I don't have any suggestions for improving the performance right now. I like your suggestion for reports on the slow points of a pipeline, I'll pass that long internally.

Regarding min, unfortunately there's no built-in function to do it with factors yet. In the meantime, I added in a bit of a hack solution to get the minimum by combining the two factors (courtesy of Nathan Wolfe).

Hi Jamie,

I'm working on an algo that uses get_fundamentals but skimming over your comments above, it sounds like it'll be deprecated, with the functionality moved to pipeline. Correct? What can be done in pipeline now? For example, could the code I've shared below be ported to pipeline? If so, is there an example?

Thanks,

Grant

    fundamental_df = get_fundamentals(  
        query(  
            fundamentals.valuation.market_cap,  
        )  
        .filter(fundamentals.company_reference.primary_exchange_id != None)  
        .filter(fundamentals.valuation.market_cap != None)  
        .order_by(fundamentals.valuation.market_cap.desc()).limit(context.n_stocks))  
    context.stocks_current = [stock for stock in fundamental_df]  

Hi Grant,

The get_fundamentals deprecation isn't on the list for tomorrow, I was simply implying that it will happen eventually. That being said, I'd strongly suggest that you try to make the switch to pipeline. To help get you started, I wrote up the example you shared in this notebook. One of the nice things about pipeline is that you can essentially copy a pipeline over to the IDE directly from research. The only difference is how it's run. I'll post a barebones algo to highlight this in a moment.

Here's the backtest. I realized I left a couple extra import statements in the notebook by accident (CustomFactors, SimpleMovingAverage) - they're removed from this example.

Thanks Jamie - saved me a lot of work! --Grant

Hi Jamie,

Back in June, you said that "our efforts will be focused on improving the load time of fundamentals data and eventually allowing a way to specify dates (or relative dates, at least) in a pipeline factor".

Are you able to give any sort of progress update on these efforts?

Thanks,
Richard

Hi Richard,

In terms of responsiveness, we took a 2 step approach. We wanted to try to get the existing architecture as fast as possible -- squeeze as much performance as we can. So we made incremental improvements to the fundamentals-pipeline performance on the scale of 10 - 30% (iirc) over the course of the summer. The "low hanging fruit", so to speak.

The improvements were measurable, but likely not perceptible to a casual observer without a timer.

We are in the design process of a more 'fundamental' re-architecture of how we store and deliver this data, that will use lessons we've learned from our data partner program and from delivering our market data. This is in the early stages as a project however and it would be foolish for me to project a timeline.

Hope that helps.

Thanks
Josh

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

OK, thanks Josh, appreciate the response, please post on this thread when there is some meaningful progress on this.
Cheers,
Richard