Hi All,
One of the biggest remaining holes in Pipeline since the launch of Classifiers has been lack of support for string-typed data. Support for strings was merged in Zipline about a week ago, and as of today we now have support on Quantopian for loading string data in Pipelines.
There are two major use-cases for strings:
- Converting them into booleans via string-matching predicates (e.g. "startswith").
- Using them as grouping keys to transform numerical expressions (e.g. Z-Score asset returns by country code).
The groupby use-case works for strings exactly the way it does for integer columns like SectorCode. The Classifier announcement post provides an overview of grouping operations, and there's a new Working with Strings section in the Pipeline docs that provides another example with a string column.
The use-case of implementing filters based on string data is supported by a suite of new methods on Classifier:
More information on each of these methods is available in the Classifier API Reference.
To demonstrate the kinds of operations one might want to do with string-based filters, I've attached a notebook that implements 9 common universe selection criteria in Pipeline and analyzes their outputs.
This analysis is a step toward eventually providing recommended synthetic trading universes (e.g. a "Quanto 500" or "Quanto 3000") as efficient Pipeline built-ins, so I'm interested to hear if there are other interesting filtering criteria that could be included in the analysis.
- Scott