Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Newbie Question - Would it be possible to include external indicators into an algorithm?

Would it be possible to include external indicators into an algorithm?

Hi, was wondering if it would be possible to include external indicators or factors into an algorithm? When i mean external, like factors of weather, news headlines, twitter mesages, viral videos, etc. How would this be done?

For example, when the government announced they were implementing new gun control laws, the stocks of gun related companies rose. Now getting that type of news feed wouldn't be hard via a script of some sort, then passing it in as an indicator of when to start an algo by monitoring other indicators to lead to a buy / sell order.

Another example: Monitoring natural disasters to start an algo or not. (earthquake data, solar flares, cloud movements, etc.)

I always wanted to make something like this, but I just never knew the know how of how to get to this point. Hopefully I'm in the correct place.

20 responses

This would be interesting to backtest, but not sure how to go about this.

Hi Patrick,

You are most definitely in the right place. All of your ideas sound like fascinating experiments.
Right now, you have to paste the data into your algorithm script. We still have a lot of work to do to round out our market data offering, but we have had numerous requests for custom datasources.

thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

i currently do automated scripting and website scrapping and I can more or less grab any type of data in any application and feed it back into your system.

How are you persisting the data?

its just that i never did it for trading stuff, but i would imagine it would work some how... i use it for other things like automating my business workflows like printing invoices, manual data entry, etc. what you mean by persisting data?

This is just an idea and i dont even know if it would work, but i imagine it would tho.

I would be happy to use the pandas DataSource to grab yahoo data for the VIX index!

I just tried everything I could think of to get the VIX index data into quantopian, it looks like the HTTP fetches are blocked, and pasting the data into the editor I think has just hung my chrome.

I got it to paste, but pandas.core.frame.DataFrame.from_records is also blacklisted. Trying other methods...

There is a balance to be struck between giving algorithms access to useful Python functionality, and protecting the stability and security of our backtesting infrastructure and data.

For example, we can't allow algorithm code to read from or write to the filesystem on our servers, and many of the library endpoints that could be used to import data enable filesystem access.

As another example, if we allowed algorithms to use urllib2 to fetch data from arbitrary URLs, that library could also be used to export our minutely price data to an arbitrary URL, which would violate the license terms under which we've acquired that data; these terms only allow the data to be used on our site, and not exported.

We know that we need to provide a way for users' external real-time data to be accessible from within algorithms running on our platform, but because of the issues outlined above, it's not a simple problem to solve. We do intend to solve it, and hearing from you all about what you need in terms of data access is helps to guide us toward the correct solution, so thanks!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Well, I would think a good non-threatening start would be some way (any way) of getting time series data into a pandas DataFrame/Series/Panel within the simulations. Last night I couldn't even find a way to cut and paste data into a DataFrame....

The second step I think ought to e some kid of arbitrated/proxied access to web assets -- doesn't have to be complete urllib...

Hi Simon,

I like what you are proposing quite a lot. I think one safe thing would be to expose a "get from yahoo" method that uses the builtin pandas functionality internally.
A similar magic method could be provided for grabbing arbitrary csv data via http/s.

In both cases, I have one nagging concern. We're hard at work adding live market data to enable live market simulations and live trading. What behavior should we expect or require from user defined datasources such as these? Any thoughts or suggestions?

thanks,
fawce

I think it would be fair to expect them to be static data only up until 'yesterday'. It would be impolite to encourage users to poll some data site every minute anyway. If you decided to include the get from yahoo method, it would be helpful if it metered out the data in the normal handle_data callback to continue to assist in preventing "look ahead" bias.

Personally, I think it might be time for me to look into zipline directly :)

I've been using a small script that I wrote to get pricing data (including VIX) from yahoo with success for the past 3 years or so. It's only closing prices and it's written in Haskell, but I suppose it would be quite easy to port it to whatever framwork/language you'd like.

I don't think getting the data is the main issue for me at least, it's getting the data into the hosted quantopian backtests.

Patrick I think complex event processing is the type of technology you could use for some of your trading ideas..
This company provides software to the financial community specifically for those kinds of task... www.progress.com/en/apama/complex-event-processing.html
I think there are some open source alternatives out there too...

Some comments:

  • Members and Quantopian might want to have a look at https://www.recordedfuture.com/this-is-recorded-future/recorded-future-api/.
  • I don't understand Fawce's concern "What behavior should we expect or require from user defined datasources such as these?" Fawce, perhaps you could elaborate.
  • It seems like a pretty severe restriction that "we can't allow algorithm code to read from or write to the filesystem on our servers" as Jonathan states above. Could Quantopian use a volatile RAM drive?
  • Note that there are physical solutions to preclude bi-directional data flow (e.g. see http://en.wikipedia.org/wiki/Unidirectional_network) which are not as inefficient as "air gaps." I gather that Quantopian may have limited flexibility in this area, since my assumption is that you have outsourced your physical infrastructure, right?
  • Besides the licensing issue highlighted by Jonathan above, it seems that Quantopian might have other legal concerns in enabling members to be free-wheeling in their ability to grab external data. There might be liability on the part of Quantopian if the external data were illegally obtained, even though the algorithm was configured solely by a member, without direction from Quantopian.
  • It seems like many of the concerns and constraints in this thread could be relieved if Quantopian offered some form of an API that could be embedded in Python code running on a member pc/server. Has there been any consideration of this? What has been the feedback at the meetups?

Grant

  • Recorded Future is indeed doing some neat stuff.
  • Zipline is built to process time-series data, specifically minute-bar data. For live trading, we're building something that is essentially a minute-bar generator - it processes a live data feed and pumps out a new bar for every stock once per minute. Fawce was asking about what sort of data structure and uptime requirement and responsiveness makes sense for external data sources. For instance, what happens if your algo uses two data sources, but one of them stops responding, or responds with a latency that keeps the algo from running frequently? This is obviously a solvable problem, but we haven't settled on a solution yet. Feeedback is welcome.
  • I think that what Jik was trying to say is that we can't allow code to read or write to the same filesystem that has our OS and application code. We could build a feature to permit writing to a separate drive. That sort of happens already with our logging feature, but that is obviously more restricted than having access to create, modify, and read files, but it's the same idea. I suspect this one is pretty far down the road, but I can always be persuaded to reprioritize!
  • It seems to me that a software method to manage the data flow will be sufficient. Simon's examples are great ones. Is there a physical solution that you see as easier?
  • It's an interesting question as to our liability. At some point I'll ask our lawyers about that one. I tend to only cross that kind of bridge when I get there, though.
  • We've certainly considered a bunch of different ways to manage the execution of code and the storage of data. Currently, we've made a large chunk of the code available through Zipline, and we've made our data source available through the API on the Quantopian website. It's the right fit so far for our resources and license restrictions. We've been thinking a lot about ways to edit the code in another place; one option would be to run your code from github, which lets you edit it wherever you want. The intractable part of this is that I can't see any way to have our data available anywhere that isn't one of our servers.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

First off, I think it would be helpful to focus on what sort of trading systems quantopian is trying to support. All the batch processing stuff is building "daily" bar aggregates, if I am not mistaken. However, it's running these every minute, as if you'd want to enter a 20ma/200ma the minute it happens. I am not at all sure that you do. In my humble opinion (as someone whose not put a live black box into production yet), you'd want daily systems to run either after or just before the close if you want to allow MOC orders. In this scenario, a slow secondary data source is immaterial, since there's not any high frequency action happening.

If quantopian wants to support "high frequency trading" or perhaps more aptly "medium frequency trading", then I think more effort should be put into visualizing how the systems are behaving on the data, and making it easier to work with (or better examples of) multifrequency data handling. Batch transforms could run on any periodicity of data. Multifrequency data sets could be referenced in the data handlers.

To be honest, I expect the sweet spot is around 30m bars, less than that and one is prey for the HFTs and general noise, and more than that and it's hard to create normalized bars that make sense. But I am just an amateur! Definitely better debugging of what is happening at the minute level would be helpful! And thanks for all the great work, I am loving zipline which better suits my workflow, but if I come up with anything educational I'll be sure to backport it to quantopian!

Hi Dan,

Regarding the trading system that Quantopian is envisioning, do you have any documentation that you would be willing to share (e.g. system schematic, timing diagrams, state diagrams, etc.)? Will there be a system-wide common clock for synchronization (e.g. of various data streams)? Also, how much latency can you tolerate between the actual markets and your trading system? Presumably, you will have to tolerate some delay, right?

Grant

If the P/E ratio is under 10, questionable news will typically make the price appreciate. Conversely, if the P/E ratio is higher than 70, questionable news will more often make the price drop. I hope those extreme parameters are helpful when attempting to quantify qualitative data such as questionable news.