I'd like to convert the sector information from
Sector()
into a oneHot encoded array and return those columns rather than the standard Sector column. I have a function which will take the sector values and turn them into a binary array however I'm not sure how to override the Sector method so that it returns the correct value.
MORNINGSTAR_SECTOR_CODE = {
-1: 'Misc',
101: 'Basic Materials',
102: 'Consumer Cyclical',
103: 'Financial Services',
104: 'Real Estate',
205: 'Consumer Defensive',
206: 'Healthcare',
207: 'Utilities',
308: 'Communication Services',
309: 'Energy',
310: 'Industrials',
311: 'Technology' ,
}
def oneHot_sectors(sector_keys):
##- Convert the Sectors column into binary labels
sector_binarizer = preprocessing.LabelBinarizer()
strlbls = map(str, sector_keys) #LabelBinarizer didn't like float values, so convert to strings
sector_binarizer.fit(strlbls)
sector_labels_bin = sector_binarizer.transform(strlbls) # this is now 12 binary columns from 1 categorical
##- Create a pandas dataFrame from the new binary labels
colNames = []
for i in range(len(sector_labels_bin[0])):
colNames.append("S_Label_" + str(i))
sLabels = pd.DataFrame(data=sector_labels_bin, index=sector_keys), columns=colNames)
return sLabels
now...
oneHot_sectorLabels = oneHot_sectors(MORNINGSTAR_SECTOR_CODE.keys())
creates a pandas frame with 12 columns for a binary encoding of the labels.
What I'd like to do is create a CustomFactor
which will map the Sector to the correct row and return those values. Is this possible?