Hi everyone,
As Quantopian integrates more data onto the platform, a more varied range of datasets is being incorporated into the contest. We thought it may be helpful for you all to see how often these datasets are being used. The following graph shows the number of currently active contest entries importing each associated dataset.
The above graph was populated by counting dataset imports. We don't have insight into how the dataset is used within the algorithm, so depending on how the import statement was formed, we might only have information regarding the module that was imported, rather than the specific dataset. Examples of this are the “Unknown FactSet” and “Unknown Estimates” labels which count the number of contest algorithms that explicitly import the quantopian.pipeline.data.factset
and quantopian.pipeline.data.factset.estimates
modules, respectively.
As the graph shows, some datasets are used more than others. Uniqueness isn’t a measure in the contest, but it is part of the allocation evaluation process. We recognize that we don’t yet give you insight in the form of a metric indicating the “uniqueness” of your strategy, but we are hoping that by showing dataset usage, we can help identify opportunities to write a unique strategy. In this case, there’s a good chance that using some of the less used datasets like Estimates will lead to a unique strategy that would be less correlated to existing ones. We encourage you to submit contest entries using FactSet Estimates. Go to this post for an example algorithm using FactSet Estimates and refer to the Data Reference to learn more about the dataset.
Hope everyone finds this feedback helpful.