Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Access Data in Pipeline Output Pandas Dataframe

I apologize in advance if this is a basic question, but I am having trouble accessing the data in the dataframe that is returned from my Pipeline. As I understand, the dataframe that is returned by the Pipeline will have the assets that pass my screen as the rows, and any factors I applied as columns. How would I access one cell in that dataframe? I thought it would be something along the lines of:

val = context.output.iloc[etf]['sma_10']

where, the etf object is pulled from the context.portfolio.positions dictionary in a for-each loop. However, when I call it like this I get this error:

TypeError: cannot do positional indexing on class 'pandas.indexes.base.Index' with these indexers [Equity(9458 [SGY])] of type 'zipline.assets._assets.Equity'

I can do some hack workaround by figuring out the position of the asset I am looking for in the frame, and then using that, but I feel like something like this must surely be built in to the pandas library. Please let me know if this question needs any clarification. Thanks!

7 responses

Take a look at this post https://www.quantopian.com/posts/keyerror-when-i-try-to-get-column-data-from-pipeline-output

Generally the fastest way to read a single value from a dataframe is to use the '.get_value' method http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.get_value.html. Assuming the dataframe has a column labeled 'price' then maybe something like this.

aapl = symbols('AAPL')  
aapl_price = pipeline_output_df.get_value(aapl, 'price')

Here's a notebook showing this in action...

Hi Dan,

Thanks for the quick response, worked like a charm! There is one slight problem using '.get_value' when iterating through the portfolio positions dictionary, however. It seems that the asset keys in this dictionary are formatted differently than in the pipeline output:

i.e. the pipeline looks like :

Equity(42247 [MEMP])

and the positions dictionary looks like:

Equity(42247, symbol=u'MEMP', asset_name=u'AMPLIFY ENERGY CORP', exchange=u'NASDAQ', start_date=Timestamp('2011-12-09 00:00:00+0000', tz='UTC'), end_date=Timestamp('2017-05-05 00:00:00+0000', tz='UTC'), first_traded=None, auto_close_date=Timestamp('2017-05-10 00:00:00+0000', tz='UTC'), exchange_full=u'NASDAQ GLOBAL MARKET')

So it is unable to key using the second. Is there any quick conversion that lets me use the second key like the first one? Apologies if this doesn't make sense.

Hmm, not sure why there is a problem. It would help if you attached a backtest. The following should both work...

    for stock in context.portfolio.positions:  
        price = context.output.get_value(stock, 'latest_close')  


    for stock in context.output.index:  
        price = context.output.get_value(stock, 'latest_close')  

Attached is some code with this in action.

Hi Dan,

Attached is a backtest, this one succeeded because I commented out the lines that were causing the error (75-80). The line specifically in question is line 77 (and in turn, line 79 as well since it does the same operation). Thanks for your patience and all your help!

The problem is you are setting a screen in your pipeline.

  pipe = Pipeline(  
        screen = (wr<=-80),  
        columns = {  
            'W%R': wr,  
            'R1': r1,  
            'S1': s1,  
        }  
    )  

The pipeline therefore only returns securities each day where wr<=-80. It turns out that some of the currently held positions (ie those in context.portfolio.positions) don't pass that filter and are therefore not in the current pipeline output. Therefore the '.get_value' method below fails

    for etf in context.portfolio.positions:  
        curr_price = data.current(etf, 'price')  
        if curr_price >= context.output.get_value(etf, 'R1'):

because 'etf' is not in the current index of pipeline (presumably because wr is greater than -80).

The 'get_value(stock, column) method works to get a value from the pipeline dataframe only if 'stock' is in the index.

Hmmm so that must be an error somewhere else in the algorithm, since it is supposed to buy and sell on the same day, nothing should be in the portfolio overnight, meaning that the portfolio should always be a subset of the pipeline output. Thanks a lot, Dan!