Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Universe filter in pipeline not being applied?

I'm new to Quantopian and maybe I just don't know where to look or I'm interpreting this wrong. My main goal is to do some factor analysis with Alphalens, so I built my pipeline from tutorials, and it runs fine with no errors. So to my understanding, the universe basically filters the stocks. However, once I extract all the unique equities and look at the number of them it always returns 9657, the same happens if I use Q500US instead. It drops down to 9089 if I remove the Q1500US filter entirely.

Is the Q1500US filter adding more stocks? That seems like it defeats the purpose of a filter. I realize that I should expect the number of unique filtered stocks to be larger than 1500 because Q1500US() is a function so the number of unique stocks should be somewhat higher but I'd still expect using Q500US() instead, would at least make a difference.

Any help understanding what's going on here or how to fix this is much appreciated.

1 response

You are using filters correctly, and yes, they work as expected. Adding a Q1500US screen to one's pipeline output will limit the assets (ie level 1 in the returned dataframe) to only those which pass this filter. There will be ~1500 unique securities each day. Since the Q1500US universe is dynamic with securities moving in and out, over a given range of days, there will be sometimes a bit more than that.

Why does the size show 9089 unique securities? The short answer is the size of the index isn't always related to the number of rows in the dataframe. The simple solution is to use get_level_values instead. Like this

assets = result.index.get_level_values(level=1).unique()  
display(len(assets))

That will display the number of assets you were expecting (~1500).

So, what's up? The issue is in understanding dataframe indexes. Basically the number of elements in the index isn't related to the number of rows in the dataframe. Jamie McCorriston wrote a very good post on this same issue (https://www.quantopian.com/posts/am-i-doing-something-wrong-pipeline-row-count-star-not-consistent-star). There is also a thread on github discussing this behavior which may be of interest (https://github.com/pandas-dev/pandas/issues/2770).

I generally always use get_level_values when I want to look at the labels and am not concerned about the index itself.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.