Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Help with Unknown Error in Notebook Environment

Hi there,
When I am in the notebook environment attempting to pull fundamental data, some of the data is being pulled as NaN (not a number). Is there a way around this issue without simply using the .fillna('') function? Could this issue be caused my the .latest expression that I am using when pulling fundamental data in the pipeline? My notebook is attached, I would greatly appreciate any and all feedback!

Justin

6 responses

NaN means no value exists for that date/symbol/factor, and is the expected pipeline result. See pandas isnull, pandas notnull, numpy nan_to_num for ways to handle. Also, pandas mean and std ignore the NaNs, whereas Numpy has nanmean and nanstd to ignore NaNs. For further control over how NaNs are processed, you can use Numpy Masks, but that's typically unnecessary.

Use the attached Notebook as an example...

Maybe broken in our version of pandas?

print len(results.Depreciation.notnull())
523511
print len(results.Depreciation.isnull())
523511

Or, since nan is always not equal to nan:
print len( results[results.Depreciation != results.Depreciation ] ) # number of nans
454785

print len( results[results.Depreciation == results.Depreciation ] ) # not nan
68726

print len( results[results.Depreciation != results.Depreciation ] ) + len( results[results.Depreciation == results.Depreciation ] ) # total
523511

Doug,
I greatly appreciate the help! I figured out how to use .notnull in order to get rid of the NaN. Do either of you know if there is a way to use Fundamental data without the .latest attached at the end? I'm not sure which numbers the .latest is pulling given that the numbers shown in my chart and the ones on Morningstar are not the same.

Justin,

.latest is the most recent posted by Morningstar as of the date you run the pipeline. So, from your notebook... start_date, end_date = '2016-01-01','2016-12-31'
means that the first index date is 2016-01-01 (actually first trading day) and has the latest Morningstar data as of 2016-01-01. The last index date has the latest as of 2016-12-31.

To pull prior data on a given date in your pipeline, you will need to write a CustomFactor. See the tutorials and the API Reference for how to write your CustomFactor.

Doug,
this really helps me a lot! Thanks for your input!