Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Runtime error due to timeout in Quantopian Open

All my three algorithms running in the April's Quantopian Open have been excluded from the contest due to a runtime error. The algos are actually the same running with different parameters, in all three cases the error happens on the same line and is a

TimeoutException: Too much time spent in handle_data call.

The code where the algorithm ran into the exception is the middle line of

    closes = history(_tail, _frequency, "price")  
    highs = history(_tail, _frequency, "high")  
    lows = history(_tail, _frequency, "low")  

This is the very start of my handle function, and I'm feetching those history dataframes in order to evaluate ATR.

I've been suggested to

optimizing your code to run faster[…] avoiding multiple calls to history()

I wonder how am I supposed to make my code faster if the problem is in a history() call.

The values for _tail are roughly 100, I don't think the length of the history is the issue as the algorithm with the shortest value stopped two days before the other two.

Those are the only three history calls I'm doing (and I'm doing because I need them) also my system doesn't even run on a minute-level basis, my handle_data has just a return statement and I'm using a custom function scheduled for daily execution.

The (custom) IDE debugger doesn't step into platform's functions - that's quite expected of course- but how am I supposed to debug an issue that happens into the history call?

My algos weren't top contender but I'm quite annoyed by this issue, has anyone experienced timeout exception or has any hint how to prevent this?

As side note I'm setting universe at 2% interval, that's the max the minute simulation will accept.

To Quantopian: can we have daily live trading algorithm possibly with a wider timeout? That should make much difference on your side.

28 responses

I have had the same problems for the past months and it's getting frustrating. There seem to be no "garanteed execution time" for our algos on the servers, and it's up to luck and server load if it makes it or not every morning.

I have been working hard to reduce the amount of processing in my algos (less securities in DollarVolumeUniverse, lighter history calls, spread strategies in multiple scheduler calls, etc.), but it doesn't seem to improve anything.

I'm not happy that someone else has the same issue, but at least I'm not alone in this.

The point is that fixing something without knowing what's broken is actually shooting in the dark.

I thought of those two points, history length and universe size, by guessing, but I don't have any actual idea of what's going wrong.
I guess one could print the timestamp up to millisecond in log and use those for after-death analysis.

I backtested the very same algo using zipline before going live, in about 90 seconds I could backtest from 2002 to 2015, shouldn't really take that long for a single loop while daily trading.

The multiple scheduler calls bit is interesting but this would complicate a bit things.

I think the only reasonable option would be knowing what's affecting the history() calls, if the runtime exception description is actually accurate. It could be some external factor suspending the execution (for whatever reason) and leading to a timeout that isn't fault of the algorithm at all.

Let's wait for a reply on the matter from support.

Or, you could try 3 different version of your code and resubmit them.

Leave one alone.

Split the periods selected in history in half for a second

And take your logic out of handle_data and use schedule_function for a third.

I actually saw this exact issue during a backtest last night. With no rhyme or reason behind it. Re-ran, no problems.

For the record, I have 2 very similar algos running in the April contest. Both are at 1.2% universe.

One got disqualified this morning at this history call:
history(120, '1d', 'price').mean()

And the other one is still running, and successfully executed
history(130, '1d', 'price').mean()

Both ran full minutes backtests with no problem. It seems to be a very random issue with "big" history calls.

Market Tech:
my algos didn't have handle_data, maybe I wasn't clear in my first message, I already had a single function scheduled for daily execution, this is my actual code for all the three algos:

def handle_data(context, data):  
    return  

also notice how my 50 days history got stopped a day before (not two as I wrote earlier) my 100 days history() call. This doesn't go along with the line that the issue is the length parameter.

As far as I can see my processing is quite lean, if I had to guess a culprit I'd have said the talib ATR() call rather than history().

Charles Piché:
That reinforces the idea that it's an issue on platform side.

I could, of course, put several algorithm instances in live trading and go bug-catching with trial-and-error approach, but this is not reliable: the errors don't always occur or the algos wouldn't have lasted a week, also I don't have runtime errors if I send algorithms live-trading or backtest with a minute basis.

The real point is that we'd need to hear what the actual reason for the history() call timeout is. It's too messy to blindly debug this.

Even working around the issue (by reducing history or the universe) doesn't fix the fact that I'm out of the April contest for something that I don't quite think is actually my fault.

Here's some code that will time-out. Unfortunately, I don't have a fix, other than to maybe use the timer code I illustrate below to sort out what's taking so long. Sounds like there just needs to be a lot more margin.

If it turns out to be the talib ATR call, you might be able to write a more efficient one using Pandas/numpy. Sounds reasonably straightforward (http://en.wikipedia.org/wiki/Average_true_range).

import time

def initialize(context):  
    context.stocks = sid(24)  
    context.iterations = 9000000000  
def handle_data(context, data):  
    start = time.clock()  
    for k in xrange(context.iterations):  
        pass  
    elapsed = time.clock() - start  
    print elapsed  

Grant Kiehne:
thanks but the point wasn't trying to just reproduce a timeout (a banal while True: pass does the job) but to reproduce the history() call issue.
That said talib uses Cython, it's unlikely that you'll implement a faster algorithm in pure python, even if you exploit numpy's acceleration.


I did try some testing: a 2 years backtest period with history lengths from 10 to 3000 and universe size from 0.1% to 10%.
Even at the worst case scenario of 3000 bars length on 10% set_universe() argument the average time for the history call is about 35 ms with values starting at 50 ms and going down.

With a 250 bars history, still quite long, and a set_universe width of 10% the average total execution time for three history() calls is 4.7 ms with a max value of ~7 ms.
This is four order of magnitude smaller than the timeout my algos suffered.

I noticed that on subsequent runs with same parameters times can greatly vary, from 370 to 460 ns for 8 securities. This probably means that the backtester is affected from contextual events (system load, GC running, page swapping?) but even with this I cannot justify the 700 times slowdown.

We haven't heard anything from Quantopian yet and at least a couple of people other than me (Charles and another person who directly contacted me) have suffered from unexpected timeout exception.

Can we have an official answer about this matter being checked?

edit
Here's the code:

import time

def initialize(context):  
    width = 10  
    context.security = set_universe(universe.DollarVolumeUniverse(100-width, 100))  
    context.max = 0  
    context.time = 0  
    context.counter = 0  

def handle_data(context, data):  
    start = time.clock()  
    h = history(250, '1d', 'high')  
    h = history(250, '1d', 'low')  
    h = history(250, '1d', 'price')  
    end = time.clock()

    # time is expressed in ns  
    spent = (end-start)*1e6  
    context.counter += 1  
    context.time += spent  
    context.max = max(context.max, spent)  
    average = context.time/context.counter

    print("history length=%4d    history width=%4d    average time=%6.0f ns    max time=%6.0f" % (  
            len(h.index), len(h.columns), average, context.max))

# Results  
#  
# | length | width | securities no. | average time (ns) |  
# | ------ | ----- | -------------- | ----------------- |  
# | 10     | 0.1   |  8             | 443               |  
# | 10     | 1     |  80            | 370               |  
# | 10     | 10    |  801           | 413               |  
# | 100    | 0.1   |  8             | 395               |  
# | 100    | 1     |  80            | 384               |  
# | 100    | 10    |  801           | 781               |  
# | 1000   | 0.1   |  8             | 381               |  
# | 1000   | 1     |  80            | 779               |  
# | 1000   | 10    |  801           | 4164              |  
# | 3000   | 0.1   |  8             | 498               |  
# | 3000   | 1     |  80            | 1548              |  
# | 3000   | 10    |  797           | 35434             |  

Thanks for the tests, those are very interesting results indeed. I also hope that we get an official answer about this problem soon. Some of my algos were put out of February, March, and now April contests because of this. I was also told to work on code optimizations, which I did, but now I'm more and more convinced that there is some kind of a "lag" issue on '1d' history calls on the contest servers.

It could get very problematic if it happened on a winner's algo.

As a suggestion to the Quantopian team, you could consider providing, at the end of a backtest, some statistics that would help developers avoid this problem. Code could be right on the edge of timing out, and users would be setting themselves up for disappointment. The implementation could be something as simple as adding the line 'execution_stats()' when the backtest is initialized. Also, perhaps issuing a warning would be helpful, such as "WARNING: Execution exceeded 40 seconds per bar. Fatal error will result if execution exceeds 50 seconds per bar."

I just think there are not enough virtual CPU cycles going around for everyone.

@Andrea
What happens if you use

h = history(250, '1d', 'price', ffill=False)  

instead?

Our goal is to keep contest algorithms running continuously. If this doesn't happen we want to improve it, and we're working on the timeout issue.

Charles had reached out to us privately and we're working with him to re-enter his algo in the contest. @Andrea I'd like to do the same for you. Can you email us at [email protected] and include the live algorithm URL for your entry?

If someone else had the same issue, reach out to us and we'll help re-qualify the algo in the contest.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

James Jack
interesting question.
I ran two backtests per length, averaged the average time and kept the bigger max time.

Results

| length | width | sec. # | avg. (ns) | max (ns) | Ffill |
| ------ | ----- | ------ | --------- | -------- | ----- |
| 250    | 10    |  798   | 2463      | 4069     | True  |
| 250    | 10    |  798   | 1372      | 1563     | False |
| 1000   | 10    |  799   | 15383     | 33080    | True  |
| 1000   | 10    |  799   | 4241      | 15510    | False |

Average time reduces quite a bit while but there's definitely something happening in backgroun that affects individual cycles: max time for 1000 bars call with Ffill argument False (so consecutive runs with same arguments) was 6 ms on one run and 29 ms on the other.

Here's a follow-up on my situation. I got another algo disqualified this morning for the april contest.

I found that the disqualifications often happen during the night before the first trading day of the week. Is there a special process running on the servers at that time? Also, why would the algos need to run during the night? Mine timed out in a function that was scheduled to be called for 11:36 am. Is there some kind of a big re-run of last week data?

I would like to better understand the algos execution cycle. Any more information would be welcome!

My intuition is that it has nothing to do with your code, and there's some performance bug in the Quantopian platform. Perhaps the history call gets blocked waiting (perhaps some needed compute resources has a lock?). We'd have to leave it to the Quantopian developers to investigate this, because like was said earlier, we don't have the debugging tools necessary for this task. (profiler )

Alex Baranosky
that's a pretty safe bet, considering the results of the aforementioned test there's no way three history() calls (again, those are all I had in my handling functions) would break the 50 s limit even with Ffill left at True I was in the span of 10 ns as max value.

There are two bad parts in this:
1. I'm still out of April's Quantopian Open even if I provided the requested info 5 days ago (got receipt confirmation a day later)
2. the issue isn't know and not in user control anyway so it's likely to happen again

An algo of mine was disqualified on Apr 6. It had been in the contest running since March 2. I was told that they had run it in a backtest (2013-2015) and that the backtest had not completed after 2 days.

It's a big shame Quantopian is not more frequent with communication on this issue :/

Recently, I too had an overnight time-out error for a simulated live trading algo:

Stopped on 4/10/2015, 3:15:11 AM
TimeoutException: Too much time spent
in handle_data call There was a runtime error on line 49.

In the absence of technical details from Quantopian, we are obliged to speculate. I'm wondering if whatever overnight QC checks Quantopian performs on live trading algos are bogging down the system. I think that there is a backtest run, before and after any code changes, to verify that the results are the same. And there's a fetcher file update, as well. So, maybe if all of the algos are run in parallel, it is too much of a load on the system, and the time-outs result? Ideally, the execution time should be independent of any overall load on the system. Each algo should run in it's own real-time sandbox, right?

When I look on http://status.quantopian.com/, it gives the appearance that everything has been working just dandy.

Also, I'm guessing that there is a central OHLCV minute bar database; each algo does not have its own copy running locally. So, one can imagine that the central database gets taxed if a multitude of algos demand data simultaneously.

We are working with the people who had their algos DQ'd from the contest because of timeout issues in handle_data(). We've reached out to some people individually and are working 1-1 with them to requalify the algo in the contest. If you're in this category and haven't heard from us, shoot us an email at [email protected]. Our dev team is still investigating the behavior in our back-end and profiling the performance.

To join in on the speculation, if in realtime/paper trading, all the algos bang on the history server at the same moment of every minute, you'd have very spiky load, and if there were any bounded queues anywhere they might max out.

Assuming none of the algorithms are trying to front-run one another, nor do complicated order management sub-minute, they could just stagger algos' execution throughout the minute, though it might be unfair to the algos which run late.

Simon, another possibility is for the infrastructure to add more resources for the high-load time periods.

I think with algorithmic trading they're always going to have widely varying resource usages throughout the day.

I've got a bit more information about the problems that this thread is covering.

It's not particularly intuitive, but some of the biggest load times for us are in the morning when we're warming algorithms up. During the actual trading hours it's more smooth. We have done a lot of optimizing work to get the warmup better, but we started running into a database problem on history calls. We're reasonably sure that is what caused the contest algo problems.

We think we turned the corner this morning with significantly improved performance. We've got several other pending changes to give it more buffer, and we're monitoring it very closely.

It's worth noting that we have different tiers of our response depending on what is affected. The highest tier is algorithms trading real money. For obvious reasons, they get drop-everything-and-fix-it type treatment for important issues. Real-money algos were almost totally unaffected by this problem, and the affected customer got a personal email when it happened. Contest algorithm problems are the next-most important,. They get a lot of attention, but in this case they didn't get enough. We should have been better, earlier, at responding and explaining the problem as we were working on it.

I'm sorry that this has caused the problem that it did. We've learned a lesson about the contest algorithms in particular and how we can be better at communicating about them.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you Dan. I guess you must be all quite busy these days at the Quantipian HQ, but yes, I think acknowledging a problem as soon as possible is always a good communication idea ;) I know it was affecting only a few members, and I'm sorry if I looked a little frustrated in this public thread, but it seemed the problem was not taken very seriously until now. I'm looking forward to see the new improvements, and hope the system can accept as much new contest entries as possible. For me, and I would guess many others, the contest is a great way to learn about the market, the available tools, and build confidence in your platform before perhaps eventually step in real-money trading.

And thank you Andrea for going public about it. I'm sure it helped speeding up the fixing process.

Hi Dan,

Is there a common (central) database, or does each user or algo get their own version to run in their own sandbox virtual server? I realize that it may start to touch on some of your proprietary system architecture details, but it would be interesting to have a better picture of what goes on under the Quantopian hood.

Grant

I realize that it may start to touch on some of your proprietary system architecture details

Yes, it does.

And it doesn't make any difference or have any impact on how algorithms are implemented in Quantopian.

And it may (indeed, probably will) change over time.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Jonathan,

Certainly, you have a business to run, with your own hard-earned IP--makes complete sense that you would protect it.

Regarding your comments about whether or not it would be helpful for algo writers to know the nitty-gritty details of your system, I don't have enough information to make a definite assessment. Intuitively, my sense is that if we had the information, it would help, if anything as context. For one, it seems that testing under your simulated live trading could be quite different from launching paper/real money at IB. So, it is not obvious that the risk of technical problems would be mitigated completely by running Quantopian paper trading. What are the differences that we might need to understand? Or have they been brought into alignment with the recent changes?

Grant

I can't think of any difference between paper trading and trading through IB beyond the obvious ones:

  • Commissions are set by IB in one, by a zipline model in the other.
  • Slippage is modeled by IB in IB paper, slippage is as-experienced in IB real money, and slippage is set by a zipline model in regular paper trading.

As for technical difference, again, the only difference is the obvious one: one sends trades to the broker, another generates trade fills internally.

In the case of the contest algos being stopped, the problem did affect real-money algorithms in the same way it affected contest algorithms. The difference in experience for real money customers is because we prioritize real-money algorithms ahead of simulated ones in response levels. Real-money algorithms are treated with the highest levels of importance, and contest algorithms are on the next lower tier.

The actual operations processes we use is pretty complex, and it's more than I'd like to go into. I think it's more important to understand what our commitments are. Real-money algorithms get the best treatment we can possibly deliver. I think the reason for that is pretty obvious. Contest algorithms, other live algorithms, backtests, etc. are all less urgent. Obviously we want to provide the best experience we can across all of these, but as a practical matter they don't get identical treatment.