Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Pandas multi index slicing - Question

How can I slice or select rows on a multi-index data frame by position? It can be done by 'Loc' if you know the labels, but how about
selecting by position?

Let's say we want the last 2 rows for every symbol in our universe, as you can see below 'iloc' returns the last 2 rows in the whole dataframe not
the last 2 rows for each symbol.

9 responses

The apply method to the rescue. Use this in conjunction with group_by to 'apply' a function to each security and not the entire dataframe. It's actually a two step process. First, create the groups one wants. In this case, groups of securities which are in our level 1 index. Next, apply a function to each group using the apply method. The function can either be an actual function name or an inline lambda function. The following will first create a group for each security. Then each group is passed to the function supplied in apply. In this case, the function simply selects the last two rows of each group. Finally, the apply method concantenates the output from each group back into a single dataframe.

last_2_rows_of_each_security = df.groupby(level=1, group_keys=False).apply(lambda group: group.iloc[-2:])

Setting 'group_keys=False' simply keeps the apply method from adding a column with the security to the output.

In the original notebook the multi-index levels were swapped and the index sorted. While there is nothing wrong with this, generally don't swap levels or sort the index unless there is a compelling reason. It becomes incompatible with other pipeline dataframes and can be confusing.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Here's a notebook showing the above code in action.

Many thanks Dan!

Another question @Dan, what if I want to get the dataframe of the symbols where the last Close is above a moving average (or a price)? the line below rises the error "IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)."

df.loc[df.groupby(level=0, group_keys=False)['close'].apply(lambda x: x.iloc[-1]) > 20]

@ Youness AJ Could you provide an example of what the original dataframe looks like What is the index and what are the columns. That would help.

@Dan

I should have called this out in my original response, but one can't average or compare prices returned in a dataframe. Prices are adjusted 'as of' each date and won't be comparable if there is a dividend or split. The way to compare prices across dates is inside the pipeline. So, in this specific case, if one is looking for the 5 day mean price then use the built in SimpleMovingAverage factor. Additionally, one can easily compare things in the pipeline definition. In this case, any stocks where the close is greater than the 5 day mean. Like this

    ma5 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=5)  
    close_above_ma5 = close > ma5

There are times however where one wants to select specific rows of a dataframe based on a condition. There are of course a lot of ways to do this but I will typically fall back to the very flexible query method (https://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.DataFrame.query.html). Something like this

    close_above_ma5 = df.query('close > ma5')

Check out the attached notebook.

Thanks @Dan! probably a did not clarify more: I'm trying to drop all the symbols where the last close < the last MA5. So my dataframe will contain only the data for the symbols were the last close > last MA5. In the example above I should get a dataframe with all rows for the stock "RACE" (since my condition is applied to the last value only), not just 4 rows where Close >Ma5. I'm trying to find a solution for the multi index dataframe/ outside the Pipeline.

Ah... that's a little trickier but, again, the query method to the rescue.

First, get the last date of the dataframe. Next get any rows where "close > last MA5 and date==last date". Get the values of index level 1. Those will be the stocks where the above condition is True. There may be more than one. Finally, select all rows where the symbol is in this list.

One really nice thing about the query method is one can use indexes just like columns. Since we named our level 1 index 'symbol' it makes it easy. Another nice thing about the query method is one can access variables by prefixing them with @ (otherwise it would think they were supposed to be a column name). So, I believe this should do it

# Get all rows of stocks where the last close of the dataframe > last ma5  
# First, find the last date in the dataframe  
last_day = df.index.get_level_values(level=0).max()

# Get a dataframe where the date is the last date and the close is greater than ma5  
close_below_ma5_on_last_day = df.query('(date == @last_day) and (close > ma5)')

# Get the values of index level 1 which will be the stocks meeting our criteria  
stocks_where_close_below_ma5_on_last_day = close_below_ma5_on_last_day.index.get_level_values(level=1)

# Finally, select those rows where symbol is in our list of stocks  
my_result_df = df.query('symbol in @stocks_where_close_below_ma5_on_last_day')

See if that's what you were looking for.