How to get size of universe

Back to Community

edited Jun 17, 2017

I have the following code in one of my algos pipeline.
mkt_cap_filter = morningstar.valuation.market_cap.latest >= 5000000000
universe = Q1500US() & mkt_cap_filter

I have a lame (python newbie) question with universe. How to get size of universe we are trading? Universe is an object (not an array or list of Python). How do I know how many equities comprise my universe. Most of the examples I saw in the forums apply some ranking and then take top N and bottom N, or top and bottom percentile. If I wanted to know what is the breadth of universe in terms of number of equities I haven't been able to search the forum for that. Anyone has any pointers or location of the universe python class that I can look into to see if a method is exposed.

4 responses

Dan Whitnable

Jun 18, 2017

The pipeline output is a pandas dataframe object. The rows are the securities and the columns are any of the factors, classifiers, or filters one sets in the pipeline definition. Therefore you can use any one of the many methods for determining number of rows in the dataframe which will equal the number of securities. This is the method I prefer (but there are others).

def before_trading_start(context, data):  


    # Get the data from the pipeline  
    context.output = pipeline_output('my_pipeline')  


    # Get the size of the universe (ie the number of securities)  
    # The rows are the securities so the number of rows (ie the index) is the number of securities.  
    universe_size = len(context.output.index)

Leo M

Jun 18, 2017

Hi Dan, thanks for your response. Is there a way to get the universe size before generating the pipeline output. My goal is to alter the size of the pipeline based on some filters I have in the pipeline creation.

def make_pipeline(context)
mkt_cap_filter = morningstar.valuation.market_cap.latest >= 5000000000
universe = Q1500US() & mkt_cap_filter
""" here I want to determine how many equities are there in the universe object, my goal is to make NUM_LONG and NUM_SHORT that go into
my pipeline output determined based on size of universe at this point.
"""

Dan Whitnable

Jun 18, 2017

You could write a small CustomFactor to accomplish this. Something like this..

class UniverseSize(CustomFactor):  
    inputs = [USEquityPricing.close]  
    window_length = 1


    def compute(self, today, assets, out, securities):  
        out[:] = len(assets)

That would result in a factor with the same value for all securities and that value would be the number off assets passed to the factor. If no mask is set when instantiating the factor, then this would be all assets in the Q database. If a mask is supplied, it would be a subset defined by the mask.

You may be better off putting all your logic into a single custom factor where you can use this length within if statements etc to branch depending upon size.

However, as a philosophical point, I'd argue that maybe this type of logic isn't appropriate inside the pipeline definition. The fundamental purpose of the pipeline is to fetch data. Fundamentally, the pipeline definition is simply to specify what data to fetch. The fact that you can do some simple logic within the definition doesn't mean you must.

I'm a proponent of separating the data from the logic. Define all the data (ie the columns) you need within the pipeline definition. Fetch that data each day in the 'before_trading_start'. That data is returned in a neat powerful pandas dataframe using the 'pipline_output' method. Then perform whatever logic your algo requires using that data at that time. You can then slice and dice and add and subtract columns from the returned dataframe using the many pandas methods which typically give much more flexibility than the factor and filter methods.

Leo M

Jun 19, 2017

Hi Dan, thanks for your idea. I will try it out.

You've successfully submitted a support ticket.

Our support team will be in touch soon.