Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Create MultiIndex DataFrame from pipeline output in IDE

This has probably been asked before, but I can't find any post on that.

I want to create a DataFrame containing several days of pipeline output. I want to have the date as the index level 0 and the equity as the index level 1. Since the equities might change every day, I can't set the level 1 index in advance.

I would need something like this. Any suggestion?

    if not 'full_data' in context:  
        columns           = [context.output.columns.values]  
        multi_index       = pd.MultiIndex(levels=[[],[]], labels=[[],[]])  
        context.full_data = pd.DataFrame(columns=columns, index=multi_index)  

    date         = get_datetime('US/Eastern')  
    equity_index = context.output.index.values  
    context.full_data.loc[date, equity_index] = context.output.values  

Right now I get the following:

KeyError: '[Equity(2 [ARNC]) Equity(24 [AAPL]) Equity(31 [ABAX]) ...,\n Equity(50758 [OKTA]) Equity(50763 [SNDR]) Equity(50782 [UPL])] not in index'

1 response

It seems that context.full_data.loc[date, equity_index] = context.output.value is not supported at the moment.

https://github.com/pandas-dev/pandas/issues/15959

This works:

    columns       = [context.output.columns.values]  
    date          = get_datetime('US/Eastern')  
    partial_index = context.output.index.values  
    partial_index_length = len(partial_index)  


    partial_index = pd.MultiIndex.from_product([[date], list(partial_index)])  
    context.partial_data = pd.DataFrame(context.output.values, columns=columns, index=partial_index)  

    if not 'full_data' in context:  
        context.full_data = context.partial_data

    else:  
        context.full_data = pd.concat([context.full_data, context.partial_data])