Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Python Code Help

I posted this to stackoverflow but didn't get any answers. Here's a link to the question: https://stackoverflow.com/questions/58193178/how-to-find-values-in-row-in-pandas-after-filtering-out-certain-data-and-then

Here's the question:

I have a df with 8 currencies (CAD, AUD, NOK, SEK, NZD, EUR, GBP, JPY) and 2 different data points: 12M PR (12 month price return) and 1Y ZS (1 year z score).

df = {'Date': ('2015-10-6', '2015-10-7'),  
      'CAD 12M PR': (-7.4, -4.9),  
      'AUD 12M PR': (-2.3, -1.6),  
      'NOK 12M PR': (2.6, 6.4),  
      'SEK 12M PR': (6.7, 8.6),  
      'NZD 12M PR': (3.1, 2.9),  
      'EUR 12M PR': (2.2, 4.7),  
      'GBP 12M PR': (-3.6, -2.5),  
      'JPY 12M PR': (13.8, 15.7),  
      'CAD 12M ZS': (3.1, 2.5),  
      'AUD 12M ZS': (-1.7, 3.0),  
      'NOK 12M ZS': (2.0, 1.8),  
      'SEK 12M ZS': (2.6, 2.6),  
      'NZD 12M ZS': (-4.5, -5.6),  
      'EUR 12M ZS': (-3.8, -3.7),  
      'GBP 12M ZS': (-1.6, -2.7),  
      'JPY 12M ZS': (3.0, 2.1)  
     }  

I want to first apply a filter by retrieving the lowest 4 currencies by 12M PR along with those currencies respective 1Y ZS. After the first filter, df2 should look like below:

df2 = {'Date': ('2015-10-6', '2015-10-7'),  
      'CAD 12M PR': (-7.4, -4.9),  
      'AUD 12M PR': (-2.3, -1.6),  
      'EUR 12M PR': (2.2, 4.7),  
      'GBP 12M PR': (-3.6, -2.5),  
      'CAD 12M ZS': (3.1, 2.5),  
      'AUD 12M ZS': (-1.7, 3.0),  
      'EUR 12M ZS': (-3.8, -3.7),  
      'GBP 12M ZS': (-1.6, -2.7),  
     }  

After the filter has been applied, I want to retrieve the 2 currencies (from list that's been filtered) that have the lowest 1Y ZS. Of the filtered list above, AUD and EUR have the lowest 1Y ZS for both dates (but this could change). After this sort by 1y ZS, df3 should look like below:

df3 = {'Date': ('2015-10-6', '2015-10-7'),  
      'AUD 12M PR': (-2.3, -1.6),  
      'EUR 12M PR': (2.2, 4.7),  
      'AUD 12M ZS': (-1.7, 3.0),  
      'EUR 12M ZS': (-3.8, -3.7),  
     }  

So the first filter finds the 4 currencies with the lowest 12M PR, and the 2nd sort finds the 2 currencies with the lowest 1Y ZS of the 4 currencies with the lowest 12M PR. But I don't know how to go from df to df3.

I can get the 4 currencies with the lowest 12M PR (df2) using the below code:

Short = {  
              'Short 1':  
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].T.apply(lambda x: x.nsmallest(1).idxmax()).str[0:3],  
             'Short 2':  
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].T.apply(lambda x: x.nsmallest(2).idxmax()).str[0:3],  
             'Short 3':  
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].T.apply(lambda x: x.nsmallest(3).idxmax()).str[0:3],  
             'Short 4':  
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].T.apply(lambda x: x.nsmallest(4).idxmax()).str[0:3],  
             'Short 1 12M PR':  
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].apply(lambda row: row.nsmallest(1).values[-1],axis=1),  
             'Short 2 12M PR':  
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].apply(lambda row: row.nsmallest(2).values[-1],axis=1),  
             'Short 3 12M PR':  
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].apply(lambda row: row.nsmallest(3).values[-1],axis=1),  
             'Short 4 12M PR':  
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].apply(lambda row: row.nsmallest(4).values[-1],axis=1),  
                }  

Once I have the 4 currencies with the lowest 12M PR (df2), I don't know how to apply the last sort based on the filtered list (df2) to arrive at df3.

2 responses

The difficulty here is the original dataframe isn't in a 'nice' format. If it were more nicely formed this could be done in two lines

df2 = df.loc[df.groupby('date').PR.nsmallest(4).index.droplevel(0)]  
df3 = df2.loc[df2.groupby('date').ZS.nsmallest(2).index.droplevel(0)]

See the attached notebook. Most of the beginning cells are simply re-structuring the data to make it easier to parse.

One assumption I made was to find the currencies with the 4 smallest PR values by date and then find the currencies with the 2 smallest ZS again by date. In other words for each date find the 4 currencies with the smallest PR values, then from those, find the 2 currencies with the smallest ZS values.

This may be a wrong assumption because the results for df2 and df3 are not the same as the results expected in the original post.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Wow, this is a much more elegant and simple solution than the one below. Thanks so much for your help! Really appreciate it!

import pandas as pd


df = {  
    "Date": ("2015-10-6", "2015-10-7"),  
    "CAD 12M PR": (-7.4, -4.9),  
    "AUD 12M PR": (-2.3, -1.6),  
    "NOK 12M PR": (2.6, 6.4),  
    "SEK 12M PR": (6.7, 8.6),  
    "NZD 12M PR": (3.1, 2.9),  
    "EUR 12M PR": (2.2, 4.7),  
    "GBP 12M PR": (-3.6, -2.5),  
    "JPY 12M PR": (13.8, 15.7),  
    "CAD 12M ZS": (3.1, 2.5),  
    "AUD 12M ZS": (-1.7, 3.0),  
    "NOK 12M ZS": (2.0, 1.8),  
    "SEK 12M ZS": (2.6, 2.6),  
    "NZD 12M ZS": (-4.5, -5.6),  
    "EUR 12M ZS": (-3.8, -3.7),  
    "GBP 12M ZS": (-1.6, -2.7),  
    "JPY 12M ZS": (3.0, 2.1),  
}

df2 = {  
    "Date": ("2015-10-6", "2015-10-7"),  
    "CAD 12M PR": (-7.4, -4.9),  
    "AUD 12M PR": (-2.3, -1.6),  
    "EUR 12M PR": (2.2, 4.7),  
    "GBP 12M PR": (-3.6, -2.5),  
    "CAD 12M ZS": (3.1, 2.5),  
    "AUD 12M ZS": (-1.7, 3.0),  
    "EUR 12M ZS": (-3.8, -3.7),  
    "GBP 12M ZS": (-1.6, -2.7),  
}


df3 = {  
    "Date": ("2015-10-6", "2015-10-7"),  
    "AUD 12M PR": (-2.3, -1.6),  
    "EUR 12M PR": (2.2, 4.7),  
    "AUD 12M ZS": (-1.7, 3.0),  
    "EUR 12M ZS": (-3.8, -3.7),  
}

pd_df = pd.DataFrame(df)

# setup  
n_PR = 4  
n_ZS = 2  
target_date = "2015-10-6"

# only look at target date data for now  
pd_target_date = pd_df.loc[pd_df["Date"] == target_date]

# separate 12M PR and 12M ZS  
pd_PR_df = pd_target_date.filter(regex=".*12M PR")  
pd_ZS_df = pd_target_date.filter(regex=".*12M ZS")

# get the smallest n values for PR and ZS  
pd_PR_df = pd_PR_df.transpose().nsmallest(n=n_PR, columns=0).transpose()

# get the country names of those that passed the first filter  
# 3 is hard coded for 3-letter symbol for currency  
lowest_countries = [x[:3] for x in pd_PR_df.columns]

# get the lowest countries' ZS  
regex_str = "(" + ".*|".join(lowest_countries) + ".*)"  
pd_ZS_df = pd_ZS_df.filter(regex=regex_str)

# aggregate results back to original data frame and sanity check  
pd_df2_test = pd_df[pd_PR_df.columns].join(pd_df[pd_ZS_df.columns], how="outer")  
pd_df2 = pd.DataFrame(df2)  
pd_df2 = pd_df2.drop(columns=["Date"])  
# absurd assert to make sure they match, this was significnatly more complicated than it should have been, there is probably a better way  
assert set(pd_df2_test.columns) == set(pd_df2.columns) and all(  
    [  
        len(pd_df2[pd_df2[col] == pd_df2_test[col]]) == len(pd_df2[col])  
        for col in pd_df2.columns  
    ]  
), "DataFrames did not match"


# second filter  
pd_ZS_df = pd_ZS_df.transpose().nsmallest(n=n_ZS, columns=0).transpose()

lowest_countries = [x[:3] for x in pd_ZS_df.columns]

# get the lowest countries' PR  
regex_str = "(" + ".*|".join(lowest_countries) + ".*)"  
pd_PR_df = pd_PR_df.filter(regex=regex_str)

# aggregate results back to original data frame and sanity check  
pd_df3_test = pd_df[pd_PR_df.columns].join(pd_df[pd_ZS_df.columns], how="outer")  
pd_df3 = pd.DataFrame(df3)  
pd_df3 = pd_df3.drop(columns=["Date"])  
# absurd assert to make sure they match, this was significnatly more complicated than it should have been, there is probably a better way  
assert set(pd_df3_test.columns) == set(pd_df3.columns) and all(  
    [  
        len(pd_df3[pd_df3[col] == pd_df3_test[col]]) == len(pd_df3[col])  
        for col in pd_df3.columns  
    ]  
), "DataFrames did not match"

final_result = pd_df3_test  
print(final_result)