Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Research Data Directory, Community Data, Fetcher and Courtesy

Hello,
I was hoping to play in Research with the datasets from https://atlas.media.mit.edu, the observatory of economic complexity, hoping to combine that data with data Quantopian makes available. I made csv files from the compressed tsv files they make available but they end up being about 180 Mb, and another is 2.2 Gb, which I would guess to be beyond courtesy if not capacity to upload. What are the file size limits of read_csv in research and fetch_csv for algorithms, and limits of the data directory itself? Would it be ok to just break the files into bite size chunks and use that?
For something like the Economic Complexity data set there might be a lot of other people interested in using it too. We have the fundamentals dataset from morningstar available, would it be considered to set up a few more shared databases if there was enough interest in a particular dataset? Maybe establish a 'commons' for datasets that people have uploaded and wouldn't mind sharing? I'm looking at https://github.com/quantopian/zipline/wiki/How-To-code-a-data-source and wondering if it could be formatted in such a way, but the sids requirement makes it not really appropriate. The goal of the analysis would be to detect correlations between the commodities and countries with the sids, but it is far from input.
For more personal datasets it could be nice to be able to point fetch_csv at files within ones' Research data directory. Is that a functionality being considered?

Thanks,
Nathan

2 responses

Hi Nathan,

I like the idea of data commons. We've been collecting ETF data in a similar spirit on the Quantapolis wiki. Unfortunately, we intermittently blew the fuse on the web space. (See? The wiki is down currently) This should be fixed soon though and we might be able to host some of the data sets. Let me know how your search for web space goes.

Hi Nathan,
Your problem is not unique. In fact there was a related thread just earlier today talking about some of the same topics. (https://www.quantopian.com/posts/linking-research-output-to-fetch-csv-input)

The file size limit in research is 34mb, so certainly that isn't a valid solution for this kind of data.

Eventually, we are hoping to have lots of free data sets available through the same system that supports the Store, and I could envision a world one day where you can add data there to make accessible, but it's likely a ways away.

Also, if these datasets are on Quandl, getting access to them in Quantopian is a bit easier. I'll pass this information along to Josh (who is in charge of data) so he can take a look.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.