Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Pipeline returning Nan

Hi,
When I use pipeline to get:
Goodwill = morningstar.balance_sheet.goodwill.latest
I see that many of the returned values are Nan.

Is there a way to replace these Nan's with 0's?
Thanks.

4 responses

Joe,

The output of a pipeline is a Pandas dataframe. You can use any of the Pandas methods to manipulate that dataframe. In your case take a look at the 'fillna' method (see the documentation at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html ).

I've attached a notebook with an example.

Good luck,

Dan

Dan,
Thank you for your answer.
What I got from your answer (I hope I got it right) is that replacing Nan's has to be done after the construction of the pipeline. At this place the output is dataframe.
I did the above at "before_trading_start" and it worked ok.
See attached notebook.
However, when I create a new parameter (as NonCashTangibiliesAssets), how can I sort the results according to it?

Thanks.

Joe,

I'd suggest looking over the Pandas documentation for all the (some very cool) methods available. There is one called 'sort' which should do what you are asking about ( http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort.html ).

You noted that these calculations are done after the construction of the pipeline. Good catch. Everyone has their own approach but I view the pipeline as simply the data fetcher and not where the algorithm logic is encapsulated. Creating the pipeline simply gets all the data in one convenient place. The algorithm code then takes over and makes use of that data. By using filters and masks however, a certain amount of 'pre-processing' can be done when setting up the pipeline to increase performance or maybe do some of the base logic (eg filter for a tradable stock universe).

One note about your NonCashTangibiliesAssets statement. That statement creates a new Pandas series from the results dataframe. Again, everyone has their own approach but I like to keep all my data in one place as much as possible. You may consider simply adding a new column to the results dataframe rather than making a new series. I've attached a notebook showing how to add a new calculated column and then sort by that new column. One can then take the top n securities of that sorted dataframe and use them in subsequent trading rules.

Hope that helps.

Dan

Dan,
Thank you very much.