Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
history API: NaN in result dataframe

Hi,

I've noticed a lot of NaN values in dataframes returned by history() for some securities. sum and other argregating functions return NaN on this data. Can you suggest how to work around it?

PS: Look at the log of attached backtest for the details.

Regards,
Ed

6 responses

Should I use ffill and bfill functions, described here http://pandas.pydata.org/pandas-docs/dev/missing_data.html? Can you suggest any other methods?
Would it make sense to fill NaNs before returning dataframe from history()?

Yes, I think with history() we do not deal with missing values for you by default. You can either drop those with .dropna() or fill them with the functions you found.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Ed:

By default we should be forward filling, the current design does have the possibility for 'leading' nans. i.e. nans at the beginning of the frame when there is no data available on the first bars for the security.

Are you seeing leading nans, or nans intermingled throughout the data?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Eddie,

I'm seeing all kind of nans :)
Here are some examples:
Leading nans:
2003-04-14handle_data:27DEBUG2003-03-18 21:00:00+00:00 NaN
2003-03-19 21:00:00+00:00 NaN
2003-03-20 21:00:00+00:00 NaN
2003-03-21 21:00:00+00:00 NaN
2003-03-24 21:00:00+00:00 NaN
2003-03-25 21:00:00+00:00 NaN
2003-03-26 21:00:00+00:00 NaN
2003-03-27 21:00:00+00:00 NaN
2003-03-28 21:00:00+00:00 NaN
2003-03-31 21:00:00+00:00 NaN
2003-04-01 21:00:00+00:00 NaN
2003-04-02 21:00:00+00:00 NaN
2003-04-03 21:00:00+00:00 NaN
2003-04-04 21:00:00+00:00 NaN
2003-04-07 20:00:00+00:00 NaN
2003-04-08 20:00:00+00:00 NaN
2003-04-09 20:00:00+00:00 NaN
2003-04-10 20:00:00+00:00 NaN
2003-04-11 20:00:00+00:00 11.078
2003-04-14 13:31:00+00:00 11.078

trailing nans:
2003-04-15handle_data:27DEBUG2003-03-19 21:00:00+00:00 22.340
2003-03-20 21:00:00+00:00 22.295
2003-03-21 21:00:00+00:00 22.850
2003-03-24 21:00:00+00:00 21.970
2003-03-25 21:00:00+00:00 22.320
2003-03-26 21:00:00+00:00 22.330
2003-03-27 21:00:00+00:00 22.005
2003-03-28 21:00:00+00:00 21.900
2003-03-31 21:00:00+00:00 21.465
2003-04-01 21:00:00+00:00 21.805
2003-04-02 21:00:00+00:00 22.325
2003-04-03 21:00:00+00:00 22.290
2003-04-04 21:00:00+00:00 22.555
2003-04-07 20:00:00+00:00 23.045
2003-04-08 20:00:00+00:00 23.000
2003-04-09 20:00:00+00:00 23.000
2003-04-10 20:00:00+00:00 22.850
2003-04-11 20:00:00+00:00 22.890
2003-04-14 20:00:00+00:00 NaN
2003-04-15 13:31:00+00:00 NaN

and nans inside the data:
2003-04-15handle_data:27DEBUG2003-03-19 21:00:00+00:00 82.13
2003-03-20 21:00:00+00:00 82.23
2003-03-21 21:00:00+00:00 82.05
2003-03-24 21:00:00+00:00 82.20
2003-03-25 21:00:00+00:00 82.22
2003-03-26 21:00:00+00:00 82.30
2003-03-27 21:00:00+00:00 82.32
2003-03-28 21:00:00+00:00 82.43
2003-03-31 21:00:00+00:00 82.48
2003-04-01 21:00:00+00:00 82.36
2003-04-02 21:00:00+00:00 82.27
2003-04-03 21:00:00+00:00 82.32
2003-04-04 21:00:00+00:00 82.30
2003-04-07 20:00:00+00:00 82.22
2003-04-08 20:00:00+00:00 82.33
2003-04-09 20:00:00+00:00 82.38
2003-04-10 20:00:00+00:00 82.27
2003-04-11 20:00:00+00:00 82.23
2003-04-14 20:00:00+00:00 NaN
2003-04-15 13:31:00+00:00 82.19

I ended up using interpolate().bfill().ffill() and it helped a lot, but I'm still seeing some nans even with that.

Regards,
Ed

Interesting, the first two cases are possible states depending on when the security starts and stops trading, or if the security doesn't trade at the beginning of the window.

The last case is not expected, though, if the forward filling isn't disabled.

If you would, could you share the security/sid/universe that you are using and the line with the history API call as well?

History API is called this way: hdata = history(20, frequency="1d", field='close_price')
List of sids in the algorithm is:
sid(12915),
sid(21769),
sid(24705),
sid(23134),
sid(23118),
sid(23911)

And the output I showed was for SHY (sid 23911). Timestamp for nan in the middle of data is 2003-04-14 20:00:00+00:00
Here is one more line from the log output with the timestamps:
2003-04-15 handle_data:26 DEBUG 2003-04-15 13:31:00+00:00: Security(23911 [SHY]) 2003-04-14 20:00:00+00:00 nan

PS: You might want to see my example code attached to the first message in this topic for further details.