Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Sector Average Values are Different Depending on how They're Calculated?

The attached notebook attempts to determine the PE ratio by sector. I calculate this value 3 different ways across the same time period, but the final way shown in the last cell doesn't match the results of the other 2 methods (Columns shown in preview below by custom factor/use of demean). Any ideas why?

3 responses

Your custom factor isn't exactly correct. Should be something like this


#Custom factor for Sector PE  
from quantopian.pipeline import CustomFactor  
class SectorPE(CustomFactor):  
    # Default inputs  
    inputs = [Fundamentals.pe_ratio, morningstar.asset_classification.morningstar_sector_code]  
    window_length = 252 #annual ratio  
    # dataframe of the latest pe_ratio and sector name the [-1] specifies how many days from current day you  
    # are getting values for. -1 being most recent.  
    def compute(self, today, assets, out, pe, sectors):  
        df = pd.DataFrame(index=assets, data={"pe_ratio": pe[-1],  
                                             "sector_code": sectors[-1]})  
        # The original code is below  
        # out[:] = df.groupby("sector_code").transform(np.mean).values.flatten()  
        out[:] = df.groupby("sector_code").pe_ratio.transform(np.mean).values  

The single line that's not exactly right.

        out[:] = df.groupby("sector_code").transform(np.mean).values.flatten()  

The dataframe 'df' has two columns 'sector_code' and 'pe_ratio'. The transform method will calculate the mean of all the columns (including the 'sector_code'). This isn't what you want. To just transform the single 'pe_ratio' column add the column name after the groupby method. That will select, and calculate the mean, of just the 'pe_ratio'. Then simply use the values method (as in the original) to get the values as a series. No need for the flatten method.

See attached notebook.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks for the reply Dan, but this didn't change the results?
My custom factor for calculating the mean PE per sector still equals my demean method of calculation. These columns are shown shown next to each other in the first output of data.
In the next cell, I tried doing the same thing a different way, allowing for easy plotting, but the values don't match the other 2 methods?

Ahhh. Sorry I didn't exactly understand which calculations you felt didn't match.

So... the reason the various approaches don't match is the groups being averaged contain different securities. In the two pipeline calculations, no filter or mask is being applied. The mean PE is being calculated across all securities in the sector. However, in the 'post pipeline' calculation the pipeline has been filtered, or screened, by the tradable_filter_notNull_PE filter which includes only stocks in the Q500US universe.

Add that mask to the two pipeline calculations and the numbers match. One could also remove the screen from the pipeline (and keep the pipeline calculations without a mask) and the numbers would be the same too. They would be different from those with a mask but they will be consistent.

See attached notebook.