Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Historical fundamental financial data

Hello
I joined a few weeks ago and the system is amazing regarding price data and backtesting. However, when I started to use historical fundamental financial data (income, balance sheet etc.) the system gets really complex and limited.
Per example I want to use following data:
1. Change in latest quarter’s vs. same quarter prior year (any data)
2. Change in fiscal year vs. prior year (any data)

For 1) The best algo I is from Jamie, even it is not ideal if you want to go back 24months.

  • Also, in Jamie's algo it could be that some data entries do not pertain to a 10-Q statement. I checked the data and it seems that other statements like 8-K, 6-K always match the 10-Q statement. Are you just duplicating the data across all statements? So it doesn't matter which filing is retrieved - the data for the same period ending date will be always the same?
  • Also, I haven't found any 10-K statement. Have you plans to introduce such data? in such case the algo would definitely produce wrong data, as 10-K would have yearly data.
  • The algo has to be called twice (once for values and once for dates), since the out parameter can only have on dtype. If per out parameter different dtypes would be supported, then the algo would be more efficient, since you need only one pass. Are you planning to support different dtypes for multiple out parameters?
  • And last: I have read several posts back to 2015 where better support for historical fundamental data will be provided. Is something in the pipeline? I checked QC and they have neat classes for accessing historical fundamental data (see QC classes and QC Sample algo. Something like that would be cool!

For 2) I found no solution. Is there any?

11 responses

I'm glad you like it! I think I can help with a few of your points.

  • I think that code you got from Jamie is the solution to both of your questions. You can extend his code to do 24 months (or longer), and adjust it to do quarterly and annual lookups.
  • I'm not sure of the interaction of the different form filings. We're sharing the data that Morningstar gives us, and they're trying to provide the best-available. Can you share a sample notebook with a specific data question, maybe, that I can look more closely at?
  • We don't currently have plans to support multiple dtypes for one pipeline term. However, in Jamie's example, the speed bottleneck should be data retrieval. Even if you reference the same data in two pipeline factors, pipeline is smart enough to only load the data into memory once. The efficiency gain of running through the for loops once would be minimal. Also, if you just want the values, you don't have to get the last 4 quarters of dates in your pipeline.
  • You're looking for the text output of 10-K statements? We don't have that at this time; it's the type of data, and analysis of that, that we'd like to add in the future.
  • We don't have a major fundamental rewrite planned. We finished a recent rewrite (with big improvements) last year.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

As Dan said, those are not currently available, although meanwhile I think everyone would do well to be cognizant of the fact that Morningstar is adding fundamentals. There's code here that can be used to find out what's being added and a list of some that were new at that time.

Carlo, I took a look at Jamie's code you mentioned and it's outside of anything I've been working with but just in case it might help at all, this should provide a ratio of the most recent 3 months (mean) over the previous year's same time frame. Replace operating_income with any fundamental. For most recent trailing year vs the year before it, that would be 504 instead of 315 and 252 in place of 63. While nanfill isn't essential, I find it helps.

class Trend(CustomFactor):  
    window_length = 315     # 1 1/4 years  
    def compute(self, today, assets, out, zoo):  
        zoo = nanfill(zoo)  
        # Ratio of mean during most recent 3 months over 1st 3 months  
        out[:] = np.mean(zoo[-63:], axis=0) / np.mean(zoo[:63)], axis=0)

calling that in pipeline ....

    Trend(inputs=[Fundamentals.operating_income], mask=universe).zscore()

def nanfill(_in):    # From https://stackoverflow.com/questions/41190852/most-efficient-way-to-forward-fill-nan-values-in-numpy-array  
    # Includes a way to count nans on webpage at  
    #   https://www.quantopian.com/posts/forward-filling-nans-in-pipeline  
    #return _in            # uncomment to not run the code below  
    mask = np.isnan(_in)  
    idx  = np.where(~mask,np.arange(mask.shape[1]),0)  
    np.maximum.accumulate(idx,axis=1, out=idx)  
    _in[mask] = _in[np.nonzero(mask)[0], idx[mask]]  
    return _in  

Thank you Dan and Blue

Some things helped me, but some are still unresolved .

@Dan Can you share a sample notebook with a specific data question, maybe, that I can look more closely at?
I attached a notebook, which gives the form type of the data.However, as you note the data reported belongs not to a form 10-Q, which holds quarterly data. So my question still remains - can we rely that this fundamental data can be interpreted as "quarterly" data or do we need to filter out. The algo does interpret any form as quarterly data. But is this correct?

I think that code you got from Jamie is the solution to both of your questions. You can extend his code to do 24 months (or longer), and adjust it to do quarterly and annual lookups.

Yes, but here annual data is always TTM, but I was looking for "fiscal" year vs. prior year (any data). This is different than TTM. In this case I need first to retrieve when the fiscal year for each ends and start from there to work backwards.

Yes, I'm sure Dan understood the fiscal year thing. My code on the other hand was just trailing in the sense like trailing twelve months, TTM as you mentioned.

There's a fundamental value fiscal_year_end that returns an integer, looks like 1-12. Maybe this can help nudge toward the goal ...

class Trend(CustomFactor):  
    window_length = 315     # 1 1/4 years  
    inputs = [Fundamentals.operating_income, Fundamentals.fiscal_year_end]  # oi, fye  
    def compute(self, today, assets, out, oi, fye):  
        month_now = get_datetime().month  
        # do math to determine numbers to replace 315 and 63  
        self.window_length = # the window length determined to be needed to go back far enough for fiscal year  
        out[:] = np.mean(oi[-63:], axis=0) / np.mean(oi[:63)], axis=0)

calling that in pipeline ....

    Trend(mask=universe).zscore()  

Hello Blue

Thanks. I did not see the fiscal_year_end data. With this information it is possible to construct a fiscal year comparison, as I can retrieve the actual backtest day and go from there (might not be easy - specially the amount of data to load).

The only question remains, if the data is always the same across different form types within a quarter reporting, regardless of the form type.
Say a company files 10-Q and after a few weeks files a anther form (not a 10-Q). The algo will pick up the other form (data entry), as the algo does not differentiates the filing, only counts filing dates. Is there ensured that this data does not change and will be filled with the 10-Q data from the previous reporting? If not the algo might not work correctly, as it will pick up data that does not pertain to quarterly financial reporting. Worse would be if 10-K data would be filed, as the reporting would contain yearly data, but would be interpreted by the algo as quarterly data and distort the data. If haven't found any 10-K and hope there is no intention to include them from morningstar. If so we must include a filter or be advised by Q.

Hello Blue

Do not try to work with window_length = 315 (etc.). I tested it and the results are not accurate. The algo of Dan works, but working with fix dates (as the example of Constatino see Notebook does give you wrong results The window between quarter results vary from 20 to 88 days. I ran just comparison between '2017-01-01','2017-02-01' for 'NVDA', 'MSFT', 'IBM', 'C', 'AAL' and they differ. Dan's algo was always correct.

Above I said do math to determine numbers to replace 315 and 63. Thanks for the feedback.
If there's any chance you might be willing to attach an example backtest there's a lot of brainpower out there and it could be interesting, just have to use the Run Full Backtest button first to have it show up on the list.

Hello Blue

It is no really about replacing the 63 etc. , because every company reports a little different - meaning the filing date has no rules. It differs i time span between 22 and over 80 days in my test.

I attached 2 Notebooks (2nd in 2nd post) - one with the algo of Dan and one with "fix" date spans (like the one you psoted or Constantino). The Notebook shows IBM reporting and you will see days like the 27th of January that it will pick up wrong entries - some double and some skipped. That will not happen with Dan algo as it looks for distinct for reporting dates.

Here the 2nd notebook

Nice puzzle.

Anyone?

Carlo, does Fundamentals.fiscal_year_end turn out to not be useful after all?

You've probably seen all of these already, just want to be sure you (and any observers) are familiar with this type of search that I use a lot:
https://www.google.com/search?q="BusinessDaysSincePreviousEvent"+site:quantopian.com
https://www.google.com/search?q="PreviousAsOf"+site:quantopian.com

No I did not use the Fundamentals.fiscal_year_end, but I think I will be using it for the fiscal year comparison - I will need it later on. (but in combination with Dan's algo) You just have to load enough data (window lenght) in the pipeline. So if backtestday is 1.Jan 2017 you need to load data up to around Mar 2015 to be sure to get a year on year comparison.

I assume that BusinessDaysSincePreviousEvent gives the days from backestday to the last "quarterly" filing date. I am not sure about BusinessDaysSincePreviousEarnings as I should give me the same, but testing it, I read rather since last "Yearly Filing" date, but even this is way off. I honestly do not know what it should represent. I could not find any explicit documentation