Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
OneHot encoding Sectors within pipeline factor call?

I'd like to convert the sector information from Sector() into a oneHot encoded array and return those columns rather than the standard Sector column. I have a function which will take the sector values and turn them into a binary array however I'm not sure how to override the Sector method so that it returns the correct value.

MORNINGSTAR_SECTOR_CODE = {  
     -1: 'Misc',  
    101: 'Basic Materials',  
    102: 'Consumer Cyclical',  
    103: 'Financial Services',  
    104: 'Real Estate',  
    205: 'Consumer Defensive',  
    206: 'Healthcare',  
    207: 'Utilities',  
    308: 'Communication Services',  
    309: 'Energy',  
    310: 'Industrials',  
    311: 'Technology' ,  
}

def oneHot_sectors(sector_keys):  
    ##- Convert the Sectors column into binary labels  
    sector_binarizer = preprocessing.LabelBinarizer()  
    strlbls = map(str, sector_keys)  #LabelBinarizer didn't like float values, so convert to strings  
    sector_binarizer.fit(strlbls)  
    sector_labels_bin = sector_binarizer.transform(strlbls)  # this is now 12 binary columns from 1 categorical

    ##- Create a pandas dataFrame from the new binary labels  
    colNames = []  
    for i in range(len(sector_labels_bin[0])):  
        colNames.append("S_Label_" + str(i))  
    sLabels = pd.DataFrame(data=sector_labels_bin, index=sector_keys), columns=colNames)  
    return sLabels  

now...
oneHot_sectorLabels = oneHot_sectors(MORNINGSTAR_SECTOR_CODE.keys()) creates a pandas frame with 12 columns for a binary encoding of the labels.

What I'd like to do is create a CustomFactor which will map the Sector to the correct row and return those values. Is this possible?