Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Pipeline Datasets - How to Create?

Referencing the below documentation:

https://www.quantopian.com/help#importing-datasets

How would one create a pipeline compatible dataset? What is needed? What type of data structure (multi-index dataframe, panel?)?

Thanks!

6 responses

Importing in this context is referring to a python import statement for accessing data sets integrated into the Quantopian platform.

If you'd like to import your own data, you can use the fetcher feature.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

What if I wanted to import data and then integrate into pipeline in research, for a thousand symbols?

Ah, great question. I've not seen an example of this but generally research allows for loading up external data (see examples from the tutorials, iirc).

Adam, Josh is right that you can upload data to research and then read it in to a notebook. However, you cannot include it as a DataSet in a pipeline. Unfortunately only the built-in DataSets can be used. Any other data will have to be combined with the pipeline output.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you Josh and Jamie.

If I combine the data as part of the pipeline output, I do not leverage the power of pipeline to filter the data.

I believe pipeline is a multi-index dataframe, so if I wanted to add data to the pipeline via the pipe.add method, what data structure would I need to use to integrate properly? Would it simply be adding a Multi-index dataframe with date and symbol as indexes?

Thank you.

Hi Adam,

Unfortunately data cannot be added to a pipeline via pipe.add. Note that the output Dataframe from pipeline happens when the pipeline is computed. Everything involved with constructing your pipeline (i.e. creating and adding factors/screens) is done prior to the computation. In order to add data to your pipeline output, you have to do it after the pipeline is computed. To get a better sense of how all this works, I would strongly recommend going through the Pipeline Tutorial.

To answer your question about the Dataframe index, it's a Multi-index in research, but just an index of assets in an algorithm (the date is implied to be the current simulation date).