Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Quantopian data with Zipline?

Is it possible to use Quantopian data with Zipline? The reason I ask is that I am getting significantly different results running an algorithm with Zipline and Quantopian, the only difference, as I understand it, being the data source (Yahoo for Zipline, Quantopian data for Quantopian).

I could use Yahoo data for Quantopian, but doesn't that defeat the purpose?

33 responses

It looks like Ziplines Yahoo data fetcher uses the adjusted data column by default. This is the prices adjusted for splits and dividends. The Quantopian backtester data is adjusted for splits but not dividends.

This can make a difference in algorithms that are making trades based on price data.

Dave, the Quantopian dataset isn't something we can share. We have the right to use the data, but we do not have the right to share it.

Yahoo's data uses different adjustment methods for dividends, which is a common reason they differ.

Yahoo's data also has survivorship issues - they don't provide data on companies that have gone bankrupt.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Dan can you recommend a decent data provider for 1 minute bars?

Sorry, I've never looked into it. I'm always in Quantopian, so I've never done the search for alternate data sources.

In his new book Ernest Chan mentions CSI, Kibot, and TickData

Also, you can get a month of 5 minute bars for free from Stooq.

Thanks Dennis and Dan. It would be great if your data provider was willing to give discounts to users of quantopian.
CSI doesn't have less than daily for any decent timeframe. kibot doesn't seem to have delisting and viable corporate actions.
tickdata is good but $50k and you need to use their tickwrite software unless you are super important.

We definitely perceive the data we provide on the website as one of the reasons that people will use Quantopian rather than zipline

I should have asked earlier. I'm curious, what is it that makes you prefer Zipline? There are a lot of good answers to the question, and 'd like to understand your particular case.

I have historically traded a much bigger universe than quantopian allows. I was looking to run something on the Russell 1000 plus liquid etfs plus most liquid ADRs.

I will echo Michael's statement. Possibly for a different reason, tho.

I don't have a separate 'discovery' pipeline setup yet. So ideally I'd like to be able to process thousands of stocks in my algo as a kind of real-time screening algorithm.

OK, thanks for that feedback. I like the answers particularly because they are things we plan on providing in Quantopian in the long run.

Hello Dan,

for me the reason is, that I am using an emacs environment with vim-keybindings, which allows me a better handling of my code and loading different modules.
An easier transfer of the code between quantopian and zipline would be helpful.
Maybe I am a bit special with this emacs environment, but I expect that others are used to complex programing environments as well.

Best Regards
Fabian

Hi Dan,

I find debugging much easier using a Python IDE with a debugger. Hence zipline and Pyscripter or Eclipse Pydev or IPython for me. Print/log.info helps in Quantopian, but for more complex algorithms a debugger wins hands-down. Any plans to spruce up Quantopian's debugging capabilities in the future?

Cheers,
Dave

Fabian and Dave -

Thanks for the feedback. I've heard both of those answers before but it's good to get reinforcement.

We've been kicking around a few solutions to the emacs problem. Perhaps something like connecting to github - you can use whatever IDE you want, save to github, github writes to Quantopian. It gets pretty powerful in that way.

The debugger is harder, but is also on the list. We have to keep the debugger on our server in order to keep the data on our servers. Sometimes we swear at our IDE like a sailor because of the debugging. We see the need.

I concur with the IDE and the debugging. I also need to incorporate futures,currencies and equ which I either bought or recorded. Also, I usually start investigating ideas in ipython before implementing them.

My other immediate need is to run many identical instances to parameterize a model. I use several parameters in my main equity model and like to generate a smoothed surface of the difference in performance differences of the parameters

Dan,

Just to reinforce what Michael said, strategy optimization by looping through different share universes and portfolios, as well as strategy parameters is easily achieved with zipline. If you haven't already, you should possibly consider allowing something like this with Quantopian, though I concede that this could severely load your servers, if not strictly controlled.

The truth is that it's not too difficult to move a strategy from zipline to Quantopian, so you can have the best of both worlds. Perhaps a simpler (i.e. more automated) way of doing this would be a good solution to both the debugger and optimization issues.

Dave

Hi Dan & all,

I've posed the question before, but does anyone understand, fundamentally, why access to a clean and complete data set is expensive in the first place? In other words, we wouldn't be having this discussion if the data were free or dirt cheap.

A few thoughts and questions:

  • The Quantopian data set is updated daily, which suggests that the process for collecting and cleaning the data is fully automated. And in this age of "big data" infrastructure, the price should be dropping, right? So what's the story here?
  • My sense is that it would be in the best interest of the industry to subsidize the distribution of clean and complete data to retail traders. In other words, take a cue from Quantopian and expand the market by removing a barrier to entry. Is there any movement in this direction?
  • Has Quantopian's data vendor published, in detail, how they collect and reduce the data? Perhaps it's not a big deal, and Quantopian could reverse-engineer their system to build up an independent data set.
  • Who is the Quantopian data vendor? Last I heard, Quantopian would need permission from the vendor to use their name. Have you requested permission? If so, what was the response?

Grant

@Grant, you can get 5 min bars from Stooq that go back a month or hourly bars that go back 3 months.

All you have to do is remember to visit the site every month or 3. Oh and get a separate list of dividends/splits. Plus make sure the data is correct by validating it against other data sources.

Kidding aside, I think data providers have every right to charge for their service. And having a solid paid data service is part of the Quantopian value-add. It's unreasonable to ask them to give away their edge. They are already improving zipline which is open source. Don't ask them to do everything for free.

Hello Dennis,

My basic point is that the data should be dirt cheap for Quantopian, since my assumption is that the whole collection and "cleaning" process is automated and highly efficient (but this assumption could be wrong). The fact that Stooq can provide free data not that different from Quantopian's suggests that I'm on the right track. I don't think that in the long run, the Quantopian value-add will end up being their data set (particularly since it ends up highly limiting the tools that can be applied to the problem...hence the use of Zipline offline by some members).

Grant

Grant, There's a lot of exchange fees around market data. I used to parse recorded level 2 and I promise you its very easy. (less than a week to build a parser)

I am not asking for free market data, but the price just keeps going up. I just wanted a trustworthy source. Historical market data is not Quantopian's value add.

Quantopian's value add will be building a platform for tradingIncluding other inputs. getting other data sources at reduced rates, etc. Actually supporting live trading is huge as live trading is a constant task and things will go wrong.

Michael,

Interesting to hear that "the price just keeps going up." Intuitively, not what I'd expect, but perhaps it is a kind of monopoly situation with high demand.

Grant

That is exactly the problem. Its exchange fees that there is no way to get around.

For stuff like this, ecns will end up publishing or giving out intra-day data in exchange for trying produce more flow.

Michael and Dave, thanks for the thoughts. It matches some ideas I've got in my head.

Grant, it helps to look at the data problem in more than one part. The first part is historical data. There are a limited number of people who were recording stock data in, say, 2005 - you'd be amazed at how many companies just threw it away because they thought it had no value. There are relatively few players out there who have all the data. They have no interest in giving it away when they can charge for it instead.

The next part is live data. That's controlled by the exchanges, and you can only get access to it if you pay for it. (I know there are a few ways to scrape that data, but they're not reliable enough to trade on - at least not that I've seen).

Another part is data granularity. There are lots of uses for data with daily granularity: understanding the value of portfolios and uses like that. But there aren't a lot of uses for minute-level or tick-level data (at least, not until Quantopian came along!). There's not a lot of incentive for people to invest in creating that kind of data source because there isn't a lot of use in knowing what your IRA was worth at 2:52PM on March 22 - who cares?

Maybe in the future we'll come up with some hybrid model of 1) old data we paid for 2) medium-term data collected and distributed in an open source and 3) live data that we pay for. For the moment, the amount of data in (2) isn't big enough to be useful for us, especially when you can't do live trading without (3).

Dan, a lot of people lost their historical recordings due to data agreements. Companies change, teams move, etc. Everyone I know has had a live collector for years.

it maybe also worth building a cacs and fundamental data product that people can buy/enable in quantopian. Bloomberg backoffice is a really expensive competitor product.

Let me add my thoughts to this thread.
Over the last week I have spent quite bit of time working on both Quantopian and Zipline.
I am an avid python developer, machine learning hobbyist and quant system developer for the past 10 years.
Trading solutions a product by Neurodimensions Inc. was my application of choice so far.
But if everything goes well Quantopian will be my platform of the future and you guys have made that choice easy for my.

Let me tell what I like about Quantopian.
1. I like everything about running my already developed algorithm in Quantopian
2. The fact that its in the cloud and I don't have to worry about maintaining my servers, downtime, etc.
3. I like very much the fact that its python based and the toolset, like pandas, numpy, etc.
4. I like that the data source is part of the service. Would certainly like to see more types of data as part of the service
5. I like the web as a way to monitor, start and stop my algorithms etc.
6. I like very much the idea of the community and sharing of algorithms
7. I like the idea of limiting the algorithm to see only past data for trading, which removes all possibility of me making an error and can getting misled by dubios signals that look ahead accidentally.

Here is a list of things I don't like or would love to see happen.
1. I will never develop new algorithms using the quantopian web interface. Atleast not for algorithms other than toy problems. That would be my absolute last choice. Write now I develop it using zipline(thank you very much for open sourcing it) and then manually porting it to quantopian
2. I would really love for zipline to be more fully functional as in atleast have an ipython/pylab/mathplotlib based equivalent of the backtester results, statistics, plot diagrams etc. Its sorta what I am doing right now. But visualizing the backtesting results is a bit of pain for now. for the lack of a standard GUI results window. I am trying to hack one right ow. Will try it push it back when I have something that I like.
3. When done developing algorithms, I would like a simple script/button that can I can push/pull/update the algorithm to/from zipline to/from quantopian
4. Would love to have access to the same data from zipline to allow me to develop the models on my desktop and not be forced to go to the web just for the data. I don't even have to have very current data, just equivalent but say a month out of data would be fine. Just for development/ML-training/backtesting purposes.
5. Developing the model can take quite a bit iteration and computation power if you are running machine learning algorithms. I would prefer to do it on the dekstop or be able to buy EC2 style compute power through quantopian. Atleast the development and trials of new algorithm on my dekstop with access to historical data that may upto a month out of date. And the push and run it in the cloud with more current data.
6. Once models are development they may need to periodcally be retrained and would love to see this done in the cloud itself. The way it is currently I don't see how it can done. batch_transform works for producing signals from trained ML-models, but is not the best way to train the ML models them selves. Ideally an api that would allow adaptive/learning code to run after market close or during weekends at specific intervals or on specific conditions(say when the model might be loosing touch with reality, training win/loss ratios or E(profit/loss) hit critical limits). It would be helpful to have access to future data for purposes of training/optimization purposes
7. I would really like to see a way of building/displaying/manipulating training/ideal signals that have visibility into the future to identify ideal buy/sell/hold points and be able to "DISPLAY" them on the graphs. This is also what is used for training ML models. Trading Solutions had it and it was awesome.
8. Though I ask for 7, I would like it to be clearly quarantined through a special decorator that so that it is only applied to a quarantined learning/adpative stage of the algorithm and not for active trading.
9. ability to share code between my algorithms through the form of a shared library that I can import from within my algorithm. But I could live with that ability just on my desktop where I can organize as libraries and the quantopian push/pull functionality could combine them before pushing to the cloud.

My 2 cents.
Sarvi

Great list Sarvi. A couple things ...

You can get 5 minute data from Stooq for your offline prototyping work. Zipline can already get daily historical prices from Yahoo.

There was another thread where Dan suggested they would like to have github integration. That would allow you to develop code offline and 'update' your Quantopian algorithm very easily.

yeah, I have the stooq data.
Not enough history for actually training in the 5 minute/1 hour bars for actual training. But never the less useful for development.

I missed one above. updated with 9

Sarvi, this is really helpful feedback. I really appreciate all of the thoughtful ideas.

On #7 above, have you looked at the "record" feature? You could use that to place markers on the backtest.

Yes I did look at record to display my signal, but I am not sure how to achieve what I am looking for which sort of has the effect of looking ahead.
Meaning, the "idealsignal" I am calculating is ideally done by looking head at the next say X bars of data to see if there is going to be atleast Y% of short/long profit and then be able to display the signal at the current bar.

So in the current quantopian way since I don't have access to future bars, SO I look back for the last X bars and see if there was a short/long profit. Now I would like to display/record the signal with a lag of X bars for display purposes.
This is not a problem for me during training in batch mode since, I can use the signal the way I calculate it now and still use it by shifting the numpy array by X bars when using in alfgorithm. So I sorta have it working for batch training but I can't seem to display the signal the way I want it i.e. to show the points to short/long the stock to expect atleast Y% profit in the next X bars.

Sarvi

PS: As suggestion you should consider some options to record() that will allow me to print up/down arrows at on the graph. Not all of the signals are smooth lines. some may buy/sell/hold signals which are best shown as red-down arrows or green-arrows on the stock graph itself.

Has anyone looked at the data provided by netfonds.se? They have intraday data for a large number of stocks, I am not sure of how "accurate" the data is though. I have written a script/batch job that runs nightly and saves the data in a matlab structure (it has been running for almost two months now). I am/was working on writing this data to a MYSQL DB I setup but I recently came across this site (quantopian) so I may abandon this project in favor of quantopian. I do agree with others that github integration would be very helpful for collaborating on algo's with others. Is there any update on the github integration?

-Alex

Hi Alex,

We've done some work on the Github integration with Quantopian, but we don't currently have an ETA for when it will be available.

Regards,

Jonathan Kamens

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Is there any word on when zipline will be compatible with Python 3?

@ Saravanan Shanmugham

Hi I am trying to use zipline for using it for backtetsing. I was wondering that it would be great if you can write a blog describing the steps.
I am new to python and all my clients are Indian. So we need to backtest for Indian stocks and more importantly I need to do it for futures market.

I would really appreciate that.

regards,

Rishant Kumar Singh,
Integrated Economics, IIT Kanpur,
Co founder , Nextgenquant - An algo trading firm
[email protected]

FWIW. I recently switched from Quantopian to Zipline because Futures trading was broken in Quantopian for several weeks. They just fix it a few days ago, but it raised the following concerns for me: if it goes down again, I can't revert to an "older" Quantopian build (that's under your control, not mine). Also, it seems the Futures data your using isn't the full 24-hr data (although I'm grateful for providing whatever data you do have). So with Zipline I can ingest my own dataset to get the full 24hr 1-minute bar data, so I won't miss some important overnight trading sessions.