Community members who followed the original announcement posts for the Pipeline API may recall hints about a third expression type in the original system design. Classifiers have been in our roadmap from the very beginning, since they enable a number of important operations that involve grouping expressions on Factor outputs. Classifiers were cut from the original launch in order to make the rest of Pipeline available sooner, but we'd always planned on adding them eventually.
Today, I'm excited to announce that Pipeline's third major expression type is finally available. Attached to this post is a notebook with several detailed examples of working with Classifiers.
Some highlights from the notebook:
- There's now a new base expression type:
Classifier. In the same way thatFactorsare expressions producing numerical-valued results, andFiltersare expressions producing boolean-valued results,Classifiersare pipeline expressions producing categorical-valued results. Another way of thinking about classifiers is that they're computations that produce labels for assets. Canonical examples of classifiers are sector codes, and quartiles/quintiles/deciles of another factor (e.g. deciles of stocks by market cap). - There are two new
Factormethods,demean()andzscore(), that take an optionalgroupbyargument, which can be passed a classifier. These methods produce new Factors that apply normalizations to the daily output of the original Factor. A detailed example of how this process works can be found in new Normalizing Results section of the help docs. - There are two new builtin classifiers, and several more in the works.
- There are several new
Factormethods that produce classifiers by producing quantile labels. The most general of these isFactor.quantiles, which accepts a bin count as an argument. Convenience aliases are available forquartiles(quantiles(4)),quintiles(quantiles(5)), anddeciles(quantiles(10)).
I think having the ability to perform grouped aggregations and normalizations opens the door to many more sophisticated quant workflows, so I'm excited to see what the community builds with this new functionality. As always, I'm also interested to hear feedback on how these features could be made more useful. Are there other natural candidates for built-in classifiers (e.g. exchange id or country code) that could enable better algorithms? Are there other normalizations like demean and zscore that could be made Factor methods (one that I have my eye on right now is the existing rank() method)? Are there other interesting possible additions to the Filter/Factor/Classifier algebra? Feedback from users on these (or other) topics would be greatly appreciated.
Happy coding,
-Scott