Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Bug in Pipeline/Equity MultiIndex selection?

I'm attaching a notebook which displays a behaviour which I believe to be a bug (but correct me if this isn't so).

It's a pipeline result in form of a DataFrame. The index is a MultiIndex of (date, Equity) and the rows are fundamental data. Usual DataFrame behavior is that you can select a specific row by passing full MultiIndex "coordinates" into .loc, or you can slice one or other of the indexes by passing :. See notebook for more details. The problem seems to be that grabbing an Equity by its sid fails in some cases.

4 responses

Hi Joao

I also came across a similar problem recently when trying to test the implementation of a CustomFactor in Pipeline recently. Here are some tips from what I learned so far:

results is the output of pipeline.

Lets say we want just the values of all factors for an equity sym at all time points we would do the following:

Variables:

sym = symbols('AAPL', symbol_reference_date = dt_start)

Operation:

results.loc[(slice(None), slice(sym,sym)), :]

If we want the values of all factors on a particular day we would do the following:

Variables:

dt_one_day = dt.timedelta(days = 1)
dt_six_months = dt.timedelta(weeks = 4*6)
dt_start = dt.datetime.today() - (dt_six_months + dt_one_day)

Operation:

results.loc[(slice(dt_start, dt_start + dt_one_day), slice(sym,sym)), :]

Many thanks

Adam

Ahh, interesting, I didn't know that .loc could be used like that. Yes, that does exactly what I was looking for, thank you.

By the way, using the sid directly on the slicer also works.

result.loc[(slice(None),slice(62,62)),:]

Another way to easily get a slice for a single day is to use the '.xs' method (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.xs.html)

my_date = '2011-01-03'  
result.xs(my_date)

This returns a dataframe without the multi-index which looks exactly like the dataframe returned in an algorithm.

You can use the same method to get a slice with all dates but only a single security by setting the index level equal to 1.

my_security = symbols('IBM')  
result.xs(my_security, level=1)

Dan: Excellent, thanks for sharing!