Starting today, you can include custom datasets or signals in your algorithm for use in the contest and by extension, the funding process.
Self-Serve Data provides you the ability to upload your own time-series data to Quantopian and access it in research and the IDE directly via Pipeline. You can upload historical data as well as live-update your data on a regular basis via FTP, Google Sheets, or Dropbox.
Important Concepts
Your custom data will be processed similar to Quantopian Partner Data. In order to accurately represent your data in pipeline and avoid lookahead bias, your data will be collected, stored, and surfaced in a point-in-time nature. You can learn more about our process for working with point-in-time data in our forum post How is it Collected, Processed, and Surfaced? . A more detailed description, Three Dimensional Time Working with Alternative Data, is also available.
Once on the platform, your datasets are only importable and visible by you. If you share an algorithm or notebook that uses one of your private datasets, other community members won't be able to import the dataset. Your dataset will be downloaded and stored on Quantopian-maintained servers where it is encrypted at rest. Self-Serve data is considered "Private Content" in Quantopian's Terms of Use.
Example Notebooks
The Introduction to Self-Serve Data notebook (attached below) will show you how to format your data, upload it via Self-Serve Data and then import and use it in Pipeline.
The Self-Serve Data - How does it work? notebook (comment below) will explain how your data is processed, explain some considerations when creating your dataset, and finally how to check and monitor your dataset loads.
Additional Details
Our Self-Serve Data Help documents how to prepare, upload, access and monitor the loads of your data.
Initially, Self-Serve Data will support reduced custom datasets that have a single row of data per asset per day in csv file format which fits naturally into the expected pipeline format. Records that cannot be symbol mapped to assets in the Quantopian US equities database will be skipped. Optional live update files will be downloaded each trading day, between 07 to 10 am UTC, and compared against existing dataset records. Brand new records will be added to the base tables and updates will be added to the deltas table based on the Partner Data Processing logic.
We are always interested in learning more about your use cases and potential datasets, you can help us by filling out the Self-Serve Data survey .
Update: Learn more about Analyzing a Signal and Creating a Contest Algorithm with Self-Serve Data, in our new post which includes a Self-Serve Data pipeline example, a template Algorithm for incorporating your data, and shows you how to analyze your data using Alphalens.
If you have any questions or issues please contact us at [email protected]