Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Sectors in Pipeline

Hello, I've been thinking about sectors lately. I would expect stocks within a sector to usually be more strongly correlated to each other than the rest of the market (and sectors are easier to work with quantitatively than other correlation-inducing factors such as supply chain, geopolitical risk, etc.). Wondering if there's some exploitable alpha in exploring sector dynamics.

I was wondering if anybody is doing anything with sectors within pipeline. I'm having a hard time figuring out if it makes sense or is possible to do any sort of grouping or ranking per sector, as opposed to relying entirely on the optimizer to brute force sector neutrality. Would you do this within a single pipeline or would you duplicate one pipeline for each sector? Or is it better to do these calculations entirely outside of pipeline?

For example, fundamental factors are likely going to vary wildly between sectors. For example, a "value" factor may broadly rank utilities highly and technology stocks poorly, while what you might really want to do is compare value within each sector. While I assume the optimizer will fudge the weights for you so that you do get this desired outcome -- I don't know, does it? or does it emphasize instead the sectors that are balanced and leave the unbalanced sectors out of the portfolio? At any rate, I imagine there are some scenarios where this muddies things. Seems there are situations where having control over both net and gross sector weight would be useful.

Or what about if you wanted to rank stocks based on whether they are underperforming or outperforming their sector ETF for some sort of sector-based mean reversion algorithm? I realize you can do this outside of pipeline, but is it possible to do within pipeline?

What do people do about Communication Services? It's obviously so new, you can't backtest very far. If I try to use symbol('XLC') in pipeline outside of its lifetime it throws up its hands and gives up. (XLRE isn't very old either.) For example, if I wish to use pipeline to calculate a stock's beta to its sector, this seems like it's going to cause some problems. Any ideas?

This post is a bit of a fishing expedition. I've been thinking a lot about how to approach sectors, but the more I think about it the more questions it raises.

7 responses

In case this is useful to anybody:

    sector_ETFs = {}  
    sector_ETFs[101] = symbol('XLB') #101 - Basic Materials  
    sector_ETFs[102] = symbol('XLY') #102 - Consumer Cyclical  
    sector_ETFs[103] = symbol('XLF') #103 - Financial Services  
    sector_ETFs[104] = symbol('XLRE')#104 - Real Estate  
    sector_ETFs[205] = symbol('XLP') #205 - Consumer Defensive  
    sector_ETFs[206] = symbol('XLV') #206 - Healthcare  
    sector_ETFs[207] = symbol('XLU') #207 - Utilities  
    sector_ETFs[308] = symbol('XLC') #308 - Communication Services  
    sector_ETFs[309] = symbol('XLE') #309 - Energy  
    sector_ETFs[310] = symbol('XLI') #310 - Industrials  
    sector_ETFs[311] = symbol('XLK') #311 - Technology  

Hi Viridian,

One post by Pravin may be of interest to you. I'd be keen to see how you could take it from where as Pravin noted:

One of the very important ingredients to a mean variance optimization is the
expected returns. I used some crude way to predict next day's excess returns.

As for sector ETF, I have a sector Dict that is slightly different - I have not checked to verify the discrepancies:

sectorETF = {  
    'XLB': 'Basic Materials',  
    'XLY': 'Consumer Cyclical',  
    'XLF': 'Financial Services',  
    'IYR': 'Real Estate',  # XLRE  
    'XLP': 'Consumer Defensive',  
    'XLV': 'Healthcare',  
    'XLU': 'Utilities',  
    'IYZ': 'Communication Services',  # XTL  
    'XLE': 'Energy',  
    'XLI': 'Industrials',  
    'XLK': 'Technology'  
}

Hope this helps.

Interesting. His leverage is out of control -- > 4x out-of-sample,. Alpha out-of-sample is nowhere near as good as the in-sample. But it does appear to have some alpha (albeit with slippage turned off). It would be interesting to apply pipeline to this idea to dynamically select the stocks and not be limited to a single sector, and to hedge with stocks instead of ETF.

Would this help?

from quantopian.pipeline.classifiers.morningstar import Sector  
factor.demean.groupby(Sector())  

Oh cool, thanks, yeah that should be useful.

Hi @Viridian, i think that "normalization transforms" of various kinds for converting things to a common basis often turn out to be very useful. Definitely doing "normalized-by-sector" comparisons makes a lot of good sense. I tried to do this myself a year or two ago and i knew exactly what i wanted to do but my python skills are not very strong, so i worked on the logic part and had someone else to help me with the coding part. In that area, his skills were a bit too far ahead of mine and sometimes lost me, but basically what we did was the following:

For each sector, standardize all data arrays by the median values of the sector, then multiply each array by its weight, or 1 if no weights were provided, and then for each asset find the mean of its nonnan values.

I will see what i can dig up and then share more ideas with you if you are interested. Cheers, best regards, TonyM.

Hey, y'all. I came up with this code to create a pipeline for each sector. My early testing is showing slight improvements in my algo performance vs not taking a sector-centric approach. Obviously this will depend on your strategy.

However, I'm not sure how to handle situations where you want to run a backtest where the lookback window is earlier than the launch of the sector ETF. I realize I can swap out XLRE and XLC for ETFs mentioned above that have been around longer, but that only pushes things back a bit and does not solve the underlying issue.

Can anybody help with the code? How do I make this fail gracefully? For example, is it possible to make the algo fall back on another ETF when hitting time periods over which the first choice isn't available? Or is there a way to just skip a pipeline when the benchmark ETF data isn't available to compute the entire lookback window?

import pandas as pd  
import quantopian.algorithm as algo  
import quantopian.optimize  as opt  
from quantopian.pipeline                 import Pipeline, CustomFactor  
from quantopian.pipeline.data.builtin    import USEquityPricing  
from quantopian.pipeline.factors         import SimpleBeta  
from quantopian.pipeline.filters         import QTradableStocksUS  
from quantopian.pipeline.classifiers.morningstar import Sector  
from quantopian.algorithm                import pipeline_output  

def initialize(context):  
    # Using the SPDR ETF for each sector as its benchmark  
    context.sector_ETFs = {  
        101: symbol('XLB'),  
        102: symbol('XLY'),  
        103: symbol('XLF'),  
        104: symbol('XLRE'),  
        205: symbol('XLP'),  
        206: symbol('XLV'),  
        207: symbol('XLU'),  
        #308: symbol('XLC'),  
        309: symbol('XLE'),  
        310: symbol('XLI'),  
        311: symbol('XLK')  
        }  
    # Loop through each sector and create a pipeline for each.  
    for sector_code in context.sector_ETFs:  
        algo.attach_pipeline(sector_beta_pipeline(context, sector_code), str(sector_code) )  

def sector_beta_pipeline(context, sector_code):  
    mask = Sector().eq(sector_code) # Only select stocks in this sector.  
    benchmark = context.sector_ETFs[sector_code] # Sector ETF  
    simple_beta = SimpleBeta(benchmark, regression_length=90)  
    mask &= simple_beta.notnull()

    mask &= QTradableStocksUS()  
    mask &= USEquityPricing.volume.latest > 400000

    pipe = Pipeline(  
        columns={  
            'sector_beta': simple_beta,  
            'sector_code': Sector(),  
        },  
        screen=mask  
    )  
    return pipe  

def before_trading_start(context, data):  
    # Take the individual sector pipelines and combine their outputs into a single dataframe.  
    pipes = []  
    for sector_code in context.sector_ETFs:  
        pipes.append( algo.pipeline_output(str(sector_code)) )  
    context.output = pd.concat( pipes )  
    context.sector_beta = context.output['sector_beta']  
    context.sector_code = context.output['sector_code']  
    record(plen = len(context.output.index))