Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
fundamental data - discrepency /questions

I have been sifting through the fundamental data quite a bit over the last week and have collected a few questions

  1. There are stock symbols with ebit, ebitda as None. What are the conditions under which I can expect to get ebit/ebitda to be None? I can think of just one, when there are multiple classes of shares, all except one could be expected to have None. Are there others.

  2. I see "nan" for some values, including morningstart_sector_code. For example, if I picked up all the stocks in the fundamental data base like I did and do a union of all the sector codes, I get this 'morningstar_sector_code': [nan, 101.0, 102.0, 103.0, 104.0, 205.0, 206.0, 207.0, 308.0, 309.0, 310.0, 311.0]. Whats throwing the nan? and Why?

3.If I am trying to build and index based on match filter, how can I eliminate
3.1 Stocks that are of different share classed of the same company? Example, I don't want to see BRK.A and BRK.B. Is there a simple way to filter these out
3.2 What symbols should be eliminated? I find symbols with ebit/ebitda as None or NAN in some cases. Are there a class of symbols that are reasonably expected not to have ebit/ebitda, operating margins, ebitda_margin, etc

  1. Here is some data on NAN checks across the database. I've looked at some of them, and they look like a regular stock, such as CEL or AGM. I don't understand why their ebit/ebitda is NAN, it can be -ve, +ive or 0, right ?
    2003-01-02: NAN Check: {'LIFE': ['operation_margin', 'ebit', 'ev_to_ebitda', 'ebitda_margin', 'ebitda', 'ebit_margin', 'net_margin'], 'CEL': ['operation_margin', 'ebit', 'ev_to_ebitda', 'ebitda_margin', 'ebitda'], 'ATYT': ['ev_to_ebitda'], 'SSPI': ['ev_to_ebitda'], 'AGE': ['operation_margin', 'ebit', 'ev_to_ebitda', 'ebitda_margin', 'ebitda', 'ebit_margin'], 'AGM': ['ebit', 'ebit_margin'], 'CSNT': ['ebit', 'ebit_margin'], 'EPEX': ['ev_to_ebitda'], 'AGY': ['ebit', 'ebitda'], 'AGT': ['ev_to_ebitda'], 'AGR': ['ev_to_ebitda'], 'AGP': ['ev_to_ebitda'], 'KYO': ['ebit', 'ebitda'], 'MROE': ['ebit', 'ev_to_ebitda', 'ebit_margin'], 'CTZN': ['ebit', 'ebit_margin'], 'TINY': ['ebit', 'ebit_margin'], 'PTNR': ['ebit', 'ebitda'], 'SPI': ['ebit', 'ebitda'], 'SPM': ['revenue_growth', 'operation_margin', 'ebit', 'ebitda_margin', 'ebitda', 'ebit_margin', 'net_margin'], 'CRBC': ['ebit', 'ebit_margin'], 'GS': ['ebit', 'ebit_margin'], 'HAND': ['ev_to_ebitda'], 'BYH': ['operation_margin', 'ebit', 'ev_to_ebitda', 'ebitda_margin', 'ebitda', 'ebit...

  2. Here is some data on None checks. I can see why avg5_yrs_roic could be None if there stock wasn't public long enough, at that point to be able to calculate 5 year averages. But how about "equity_per_share_growth". And also SPG below, according to yahoo has been public since early 1990, so there should be plenty to calculate both of the fields that are none below
    2003-01-02: None Check: {'MSSN': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPP': ['equity_per_share_growth', 'avg5_yrs_roic'], 'CTZN': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPW': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPH': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPI': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPM': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPN': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPA': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPC': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPF': ['equity_per_share_growth', 'avg5_yrs_roic'], 'SPG': ['equity_per_share_growth', 'avg5_yrs_roic'], 'TISA': ['equity_per_share_growth', 'avg5_yrs_roic'], 'ARTN_A': ['equity_per_share_growth', 'avg5_yrs_roic'], 'REMX': ['equity_per_share_growth', 'avg5_yrs_roic'], 'CGPI': ['equity_per_share_growth', 'avg5_yrs_roic'], 'PQE': ['equity_per_share_growth', 'avg5_yrs_roic'], 'JNY': ['equity_per_share_growth', 'avg5_yrs_roic'], 'JNC': ['equity_per_share_growth', 'avg5_yrs_roic...

thanks,
Sarvi

4 responses

Another data point in question
If you track, the sum of all ebitda of all stocks with ebitda!=None, The data goes wacky between 2009-07-28 and 2009-08-07
'ebitda': -67559743899 'ebitda': -62492012992 'ebitda': -7997037458 'ebitda': 3357820427 'ebitda': 7364032472 'ebitda': 15032184336 'ebitda': 21544790420 'ebitda': 23172369250 'ebitda': 31460591345

Any ideas why this? Is this accurate? What can be done about this?

Hi Sarvi,

I'll dig into these today/tomorrow. I don't have quick, easy answers. There's an element of debugging we need to do to determine if it is a problem with the source data, an expected level of dirtiness with the source data or a problem with our processing of the data itself.

Thanks for all the detail.

Josh

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Sarvi

I did a bunch of digging over the past couple of days. Here's what I've found.

In terms of the stocks without EBIT or EBITDA data, there are two primary reasons for this:
1) sometimes, Morningstar doesn't provide any data for particular companies for particular metrics. This is likely by design in most circumstances as there look to be companies that don't report one or the other metric or it isn't included in their filings for some reason. This is a less common reason, but I saw this for a few tickers.
2) currently, we only process and display metrics that are delivered to Quantopian with USD currency. Other currencies are currently filtered out in this first iteration of the fundamentals data. To expand and incorporate data in other currencies is on our roadmap.

With respect to "nan" values for Morningstar sector code, I saw those occur only for a handful of companies at any one time (never more than 6 on a day, from what I saw in the sample I took). I examined the actual sids for the most recent set and they were sids high in the range. I suspect that these companies were not yet manually assigned a sector code by Morningstar.

In terms of stocks with multiple share classes, there is not a single identifier that indicates a share class is part of a multiple share class security. That said, there are fields that can help you derive this more explicitly, like fundamentals.share_class_reference.is primary_share, a boolean. Used in combination with fundamentals.company_reference.primarysymbol, you could identify and remove any with multiple share classes.

The last point . . . I'll have to dig into it a bit more. I found stock ticker = PBR in that time frame with high negative EBITDA. I suspect that we are not properly filtering out that stock with non-USD currency . . .but that is a hypothesis.

Thanks for all this feedback. It is very helpful to get all of this.

Thanks
Josh

Thanks Josh for looking into this.

In general sense, I am looking for a robust way to eliminate outliers from a basket of stocks, while not loose out on too many stocks.

  1. Zero filters gets : Stocks(6271), share_class_status=['A', 'D'] # all stocks only have "A" or "D", Active or Deactive
    2: share_class_status in ['A']: Stocks(3325), which leaves close to 1/2 the stocks in the data base as "Deactive" not listed
  2. shared_class_status in ['D']: Did a rough scan on this symbol list, and they all seem discontinued or delisted but it contains, contains 'BUD',
    But do I take this to be as of todays date? Or as of the date of query during the backtest. which was back around 2002

  3. marketcap!=None: Stocks(5811)

  4. marketcap!=None and enterprise_value!=None: Stocks(5811)

  5. marketcap!=None and enterprise_value!=None and total_revenue!=None: Stocks(5389)

  6. marketcap!=None and enterprise_value!=None and total_revenue!=None and ebitda!=None: Stocks(5079)

  7. marketcap!=None and enterprise_value!=None and total_revenue!=None and ebitda!=None and ev_to_ebitda!=None: Stocks(3765)

  8. share_class_status in ['A'] and marketcap!=None: Stocks(3090)

  9. share_class_status in ['A'] and marketcap!=None and enterprise_value!=None and ebitda!=None: Stocks(2810)

  10. share_class_status in ['A'] and marketcap!=None and enterprise_value!=None and ebitda!=None and total_revenue!=None: Stocks(2808)

The reducing list from 2,9,10,11 is worrying.

And all this does not include"NaN" checks i.e. data with NaNs

On further investigation I see fundamentals.share_class_reference.share_class_status which can have values of "A","D","I","O".
And checking for "A", eliminates a lot of noise.

I am looking to see eliminating stocks that fall outside 3 STDs from the mean for things like, ebitda or total_revenues will eliminate further noise.
I have posted another question on how to do this, in pandas, that I would appreciate pointers on.

Thanks,
Sarvi