Great question. How to import classifiers using self-serve data? First, why would one want to import a classifier? A lot of reasons, but take a simple example. One has a dataset of analyst recommendations. The values are either 'buy', 'hold', or 'sell'. Or, potentially just the numeric values 1, 2, 3 respectively. The nice thing about classifiers is they can be used for grouping (as Robert alluded to above). So, as an example, one could create a filter for the 5 lowest priced stocks in each rating.
# Assume my_self_serve is the imported dataset name with a string field called 'rating'
price = USEquityPricing.close.latest
rating = my_self_serve.rating.latest
lowest_priced_in each_rating = price.bottom(5, groupby=rating)
What makes this work is setting the column type to 'string'. When setting up the self serve data feed, select 'string' for any columns one wants to use as classifiers. Then, magically, when fetching the latest property of a dataset (eg my_self_serve.rating.latest) one will get a classifier.
That's the straightforward way to make a classifier from self-serve data. Set the column type to 'string'. Stop reading here if that's all you care about.
So, since you are still reading, you may want to know a bit more....
Below is the code for the latest
property of a BoundColumn. It can be found on Github here (https://github.com/quantopian/zipline/blob/master/zipline/pipeline/data/dataset.py)
@property
def latest(self):
dtype = self.dtype
if dtype in Filter.ALLOWED_DTYPES:
Latest = LatestFilter
elif dtype in Classifier.ALLOWED_DTYPES:
Latest = LatestClassifier
else:
assert dtype in Factor.ALLOWED_DTYPES, "Unknown dtype %s." % dtype
Latest = LatestFactor
return Latest(
inputs=(self,),
dtype=dtype,
missing_value=self.missing_value,
ndim=self.ndim,
)
Without getting into the specifics of the code, notice that it's creating a filter, classifier, or factor based upon the BoundColumn datatype. This is 'the implicit' way to create a classifier. It's 'implicit' because the type of object being created is implied by the type of data in the BoundColumn.
Now, one could also create a classifier by 'explicitly' creating it using the CustomClassifier
class. You may want to do this if the BoundColumn data type is 'number' or if one wanted to do some other data manipulation.
from quantopian.pipeline import CustomClassifier
import numpy as np
class Rating_Classifier(CustomClassifier):
inputs = [my_self_serve.rating]
window_length = 1
dtype = np.int64
missing_value = 9999
def compute(self, today, assets, out, rating):
out[:] = rating
rating = Rating_Classifier()
The key to setting up a CustomClassifier is to specify the dtype
as 'np.int64'. Also set missing_value
to some number to display if the rating is missing. One can then set the column type in the dataset setup to 'number' and use those numbers as a classifier.
Again, great question. Good luck.