Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Quantopian Pipeline Order Universe By Market Cap Desc

Hey guys,

I'm trying to use the S&P500 as my universe and sort it by the companies with the highest market cap in a descending order. How can I do this with the pipeline API? I see I can do this using get_fundamentals, but is there a way to do it with the pipeline API? I couldn't find an order_by or sort_by method in the documentation.

Currently I'm using Q500US as my universe. Here's what I have in my make_pipeline function:

def make_pipeline(context):  
    rolling_correlations = RollingLinearRegressionOfReturns(  
        target = context.regression_target,  
        returns_length = context.regression_returns_length,  
        regression_length = context.regression_length,  
        mask = context.base_universe  
    )  

    beta = rolling_correlations.beta  
    should_buy_beta =  beta < context.beta_threshold  
    securities_to_trade = (context.base_universe & should_buy_beta)

    return Pipeline(  
        columns = {  
            'beta': beta,  
            'should_buy_beta': should_buy_beta,  
            'securities_to_trade': securities_to_trade  
        },  
        screen = (securities_to_trade)  
    )

Thanks for the help,
Thomas

4 responses

The pipeline code you have is simply the 'definition' of the pipeline. It defines the columns to be returned in dataframe (ie the data) and a screen which 'filters' the rows to be returned to only a subset of securities (the dataframe index are the securities). By definition, the resulting dataframe outputted by the pipeline is sorted by the security SID number. You can't specify a sort order in the pipeline definition.

However, you certainly could sort the dataframe once it's generated. You will need to add a column for 'market cap' so you have something to sort on. See https://www.quantopian.com/help#built-in-factors Look at the very end of the list of factors for 'MarketCap'.

# Market Capitalization is used enough that it's included as a built in factor.  
# It can also be found under the Morningstar fundamentals too  
# Make sure you import the class before using it

import quantopian.pipeline.factors as Factors

def make_pipeline(context):  
    rolling_correlations = RollingLinearRegressionOfReturns(  
        target = context.regression_target,  
        returns_length = context.regression_returns_length,  
        regression_length = context.regression_length,  
        mask = context.base_universe  
    )  

    beta = rolling_correlations.beta  
    should_buy_beta =  beta < context.beta_threshold  
    securities_to_trade = (context.base_universe & should_buy_beta)

    market_cap = Factors.MarketCap()

    return Pipeline(  
        columns = {  
            'beta': beta,  
            'should_buy_beta': should_buy_beta,  
            'securities_to_trade': securities_to_trade  
            'market_cap': market_cap  
        },  
        screen = (securities_to_trade)  
    )

When you get the pipeline data (typically in the 'before_trading_start' method) it will now have a column called 'market_cap' which you can sort on.

todays_data = pipeline_output('my_pipeline')  
todays_data.sort_values('market_cap', ascending=False, inplace=True) 

The pipeline output will now be in the dataframe 'todays_data' and it will be sorted by market cap.

However, you may not really want a sorted list but rather you just want the top 20 (or some other number) of the largest market cap stocks. If that's what you really want, then that CAN be done in the pipeline definition. Simply use the '.top' method to screen the output (documented here https://www.quantopian.com/help#quantopian_pipeline_filters_Filter scroll up the page a bit).

import quantopian.pipeline.factors as Factors

def make_pipeline(context):  
    rolling_correlations = RollingLinearRegressionOfReturns(  
        target = context.regression_target,  
        returns_length = context.regression_returns_length,  
        regression_length = context.regression_length,  
        mask = context.base_universe  
    )  

    beta = rolling_correlations.beta  
    should_buy_beta =  beta < context.beta_threshold  
    securities_to_trade = (context.base_universe & should_buy_beta)

    market_cap = Factors.MarketCap()  
    return Pipeline(  
        columns = {  
            'beta': beta,  
            'should_buy_beta': should_buy_beta,  
            'securities_to_trade': securities_to_trade  
            'market_cap': market_cap  
        },  
        # Set the screen to however many stocks you want returned. Make sure to include the mask  
        screen = (market_cap.top(20, mask=securities_to_trade))  
    )

Notice the screen is set to just the top 20 stocks by market cap. Make sure you set the mask to your 'securities_to_trade'. This ensures that the '.top' method will only look at those specific stocks and only return the top 20 WITHIN THOSE STOCKS. Note that it may return fewer than 20 if there are fewer than 20 stocks in 'securities_to_trade'

Hey Dan,

Thanks so much for your reply! Just a quick question for clarification.. because I only want to use stocks that have a beta less than my beta threshold and are in the Q500US universe, I've added a securities_to_trade column and I'm setting the context.security_list to look like this:

context.output = pipeline_output('my_pipeline')  
context.output.sort_values('market_cap', ascending=False, inplace=True)  

# These are the securities that we are interested in trading each day.  
context.security_list = context.output[context.output['securities_to_trade']].index 

Is this correct? Or is this redundant and I should be able to just do something like this:

context.security_list = context.output.index

Thanks so much for your help

You can simply use

context.securities = context.output.index

It's technically not a list, it's a Pandas index object. Most of the time you can use wherever you would use a list. If you want to have just a plain vanilla python list you should use

context.security_list = context.output.index.tolist()

Both generally work just depends upon where and how you are using it.

Thanks Dan!