Runtime error due to timeout in Quantopian Open

I have had the same problems for the past months and it's getting frustrating. There seem to be no "garanteed execution time" for our algos on the servers, and it's up to luck and server load if it makes it or not every morning.

I have been working hard to reduce the amount of processing in my algos (less securities in DollarVolumeUniverse, lighter history calls, spread strategies in multiple scheduler calls, etc.), but it doesn't seem to improve anything.

I'm not happy that someone else has the same issue, but at least I'm not alone in this.

The point is that fixing something without knowing what's broken is actually shooting in the dark.

I thought of those two points, history length and universe size, by guessing, but I don't have any actual idea of what's going wrong.
I guess one could print the timestamp up to millisecond in log and use those for after-death analysis.

I backtested the very same algo using zipline before going live, in about 90 seconds I could backtest from 2002 to 2015, shouldn't really take that long for a single loop while daily trading.

The multiple scheduler calls bit is interesting but this would complicate a bit things.

I think the only reasonable option would be knowing what's affecting the history() calls, if the runtime exception description is actually accurate. It could be some external factor suspending the execution (for whatever reason) and leading to a timeout that isn't fault of the algorithm at all.

Let's wait for a reply on the matter from support.

Market Tech

Or, you could try 3 different version of your code and resubmit them.

Leave one alone.

Split the periods selected in history in half for a second

And take your logic out of handle_data and use schedule_function for a third.

I actually saw this exact issue during a backtest last night. With no rhyme or reason behind it. Re-ran, no problems.

For the record, I have 2 very similar algos running in the April contest. Both are at 1.2% universe.

One got disqualified this morning at this history call:
history(120, '1d', 'price').mean()

And the other one is still running, and successfully executed
history(130, '1d', 'price').mean()

Both ran full minutes backtests with no problem. It seems to be a very random issue with "big" history calls.

Market Tech:
my algos didn't have handle_data, maybe I wasn't clear in my first message, I already had a single function scheduled for daily execution, this is my actual code for all the three algos:

def handle_data(context, data):  
    return

also notice how my 50 days history got stopped a day before (not two as I wrote earlier) my 100 days history() call. This doesn't go along with the line that the issue is the length parameter.

As far as I can see my processing is quite lean, if I had to guess a culprit I'd have said the talib ATR() call rather than history().

Charles Piché:
That reinforces the idea that it's an issue on platform side.

I could, of course, put several algorithm instances in live trading and go bug-catching with trial-and-error approach, but this is not reliable: the errors don't always occur or the algos wouldn't have lasted a week, also I don't have runtime errors if I send algorithms live-trading or backtest with a minute basis.

The real point is that we'd need to hear what the actual reason for the history() call timeout is. It's too messy to blindly debug this.

Even working around the issue (by reducing history or the universe) doesn't fix the fact that I'm out of the April contest for something that I don't quite think is actually my fault.

Here's some code that will time-out. Unfortunately, I don't have a fix, other than to maybe use the timer code I illustrate below to sort out what's taking so long. Sounds like there just needs to be a lot more margin.

If it turns out to be the talib ATR call, you might be able to write a more efficient one using Pandas/numpy. Sounds reasonably straightforward (http://en.wikipedia.org/wiki/Average_true_range).

import time

def initialize(context):  
    context.stocks = sid(24)  
    context.iterations = 9000000000  
def handle_data(context, data):  
    start = time.clock()  
    for k in xrange(context.iterations):  
        pass  
    elapsed = time.clock() - start  
    print elapsed

Grant Kiehne:
thanks but the point wasn't trying to just reproduce a timeout (a banal while True: pass does the job) but to reproduce the history() call issue.
That said talib uses Cython, it's unlikely that you'll implement a faster algorithm in pure python, even if you exploit numpy's acceleration.

I did try some testing: a 2 years backtest period with history lengths from 10 to 3000 and universe size from 0.1% to 10%.
Even at the worst case scenario of 3000 bars length on 10% set_universe() argument the average time for the history call is about 35 ms with values starting at 50 ms and going down.

With a 250 bars history, still quite long, and a set_universe width of 10% the average total execution time for three history() calls is 4.7 ms with a max value of ~7 ms.
This is four order of magnitude smaller than the timeout my algos suffered.

I noticed that on subsequent runs with same parameters times can greatly vary, from 370 to 460 ns for 8 securities. This probably means that the backtester is affected from contextual events (system load, GC running, page swapping?) but even with this I cannot justify the 700 times slowdown.

We haven't heard anything from Quantopian yet and at least a couple of people other than me (Charles and another person who directly contacted me) have suffered from unexpected timeout exception.

Can we have an official answer about this matter being checked?

edit
Here's the code:

import time

def initialize(context):  
    width = 10  
    context.security = set_universe(universe.DollarVolumeUniverse(100-width, 100))  
    context.max = 0  
    context.time = 0  
    context.counter = 0  

def handle_data(context, data):  
    start = time.clock()  
    h = history(250, '1d', 'high')  
    h = history(250, '1d', 'low')  
    h = history(250, '1d', 'price')  
    end = time.clock()

    # time is expressed in ns  
    spent = (end-start)*1e6  
    context.counter += 1  
    context.time += spent  
    context.max = max(context.max, spent)  
    average = context.time/context.counter

    print("history length=%4d    history width=%4d    average time=%6.0f ns    max time=%6.0f" % (  
            len(h.index), len(h.columns), average, context.max))

# Results  
#  
# | length | width | securities no. | average time (ns) |  
# | ------ | ----- | -------------- | ----------------- |  
# | 10     | 0.1   |  8             | 443               |  
# | 10     | 1     |  80            | 370               |  
# | 10     | 10    |  801           | 413               |  
# | 100    | 0.1   |  8             | 395               |  
# | 100    | 1     |  80            | 384               |  
# | 100    | 10    |  801           | 781               |  
# | 1000   | 0.1   |  8             | 381               |  
# | 1000   | 1     |  80            | 779               |  
# | 1000   | 10    |  801           | 4164              |  
# | 3000   | 0.1   |  8             | 498               |  
# | 3000   | 1     |  80            | 1548              |  
# | 3000   | 10    |  797           | 35434             |

Thanks for the tests, those are very interesting results indeed. I also hope that we get an official answer about this problem soon. Some of my algos were put out of February, March, and now April contests because of this. I was also told to work on code optimizations, which I did, but now I'm more and more convinced that there is some kind of a "lag" issue on '1d' history calls on the contest servers.

It could get very problematic if it happened on a winner's algo.

As a suggestion to the Quantopian team, you could consider providing, at the end of a backtest, some statistics that would help developers avoid this problem. Code could be right on the edge of timing out, and users would be setting themselves up for disappointment. The implementation could be something as simple as adding the line 'execution_stats()' when the backtest is initialized. Also, perhaps issuing a warning would be helpful, such as "WARNING: Execution exceeded 40 seconds per bar. Fatal error will result if execution exceeds 50 seconds per bar."

James Jack

I just think there are not enough virtual CPU cycles going around for everyone.

@Andrea
What happens if you use

h = history(250, '1d', 'price', ffill=False)

instead?

Alisa Deychman

Our goal is to keep contest algorithms running continuously. If this doesn't happen we want to improve it, and we're working on the timeout issue.

Charles had reached out to us privately and we're working with him to re-enter his algo in the contest. @Andrea I'd like to do the same for you. Can you email us at [email protected] and include the live algorithm URL for your entry?

If someone else had the same issue, reach out to us and we'll help re-qualify the algo in the contest.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

James Jack
interesting question.
I ran two backtests per length, averaged the average time and kept the bigger max time.

Results

| length | width | sec. # | avg. (ns) | max (ns) | Ffill |
| ------ | ----- | ------ | --------- | -------- | ----- |
| 250    | 10    |  798   | 2463      | 4069     | True  |
| 250    | 10    |  798   | 1372      | 1563     | False |
| 1000   | 10    |  799   | 15383     | 33080    | True  |
| 1000   | 10    |  799   | 4241      | 15510    | False |

Average time reduces quite a bit while but there's definitely something happening in backgroun that affects individual cycles: max time for 1000 bars call with Ffill argument False (so consecutive runs with same arguments) was 6 ms on one run and 29 ms on the other.

Here's a follow-up on my situation. I got another algo disqualified this morning for the april contest.

I found that the disqualifications often happen during the night before the first trading day of the week. Is there a special process running on the servers at that time? Also, why would the algos need to run during the night? Mine timed out in a function that was scheduled to be called for 11:36 am. Is there some kind of a big re-run of last week data?

I would like to better understand the algos execution cycle. Any more information would be welcome!

Alex Baranosky

My intuition is that it has nothing to do with your code, and there's some performance bug in the Quantopian platform. Perhaps the history call gets blocked waiting (perhaps some needed compute resources has a lock?). We'd have to leave it to the Quantopian developers to investigate this, because like was said earlier, we don't have the debugging tools necessary for this task. (profiler )

Alex Baranosky
that's a pretty safe bet, considering the results of the aforementioned test there's no way three history() calls (again, those are all I had in my handling functions) would break the 50 s limit even with Ffill left at True I was in the span of 10 ns as max value.

There are two bad parts in this:
1. I'm still out of April's Quantopian Open even if I provided the requested info 5 days ago (got receipt confirmation a day later)
2. the issue isn't know and not in user control anyway so it's likely to happen again

Blue Seahawk

An algo of mine was disqualified on Apr 6. It had been in the contest running since March 2. I was told that they had run it in a backtest (2013-2015) and that the backtest had not completed after 2 days.

James Jack

It's a big shame Quantopian is not more frequent with communication on this issue :/

Recently, I too had an overnight time-out error for a simulated live trading algo:

Stopped on 4/10/2015, 3:15:11 AM
TimeoutException: Too much time spent
in handle_data call There was a runtime error on line 49.

In the absence of technical details from Quantopian, we are obliged to speculate. I'm wondering if whatever overnight QC checks Quantopian performs on live trading algos are bogging down the system. I think that there is a backtest run, before and after any code changes, to verify that the results are the same. And there's a fetcher file update, as well. So, maybe if all of the algos are run in parallel, it is too much of a load on the system, and the time-outs result? Ideally, the execution time should be independent of any overall load on the system. Each algo should run in it's own real-time sandbox, right?

When I look on http://status.quantopian.com/, it gives the appearance that everything has been working just dandy.

Also, I'm guessing that there is a central OHLCV minute bar database; each algo does not have its own copy running locally. So, one can imagine that the central database gets taxed if a multitude of algos demand data simultaneously.

Alisa Deychman

We are working with the people who had their algos DQ'd from the contest because of timeout issues in handle_data(). We've reached out to some people individually and are working 1-1 with them to requalify the algo in the contest. If you're in this category and haven't heard from us, shoot us an email at [email protected]. Our dev team is still investigating the behavior in our back-end and profiling the performance.

Disclaimer

Simon Thornington

To join in on the speculation, if in realtime/paper trading, all the algos bang on the history server at the same moment of every minute, you'd have very spiky load, and if there were any bounded queues anywhere they might max out.

Assuming none of the algorithms are trying to front-run one another, nor do complicated order management sub-minute, they could just stagger algos' execution throughout the minute, though it might be unfair to the algos which run late.

Alex Baranosky

Simon, another possibility is for the infrastructure to add more resources for the high-load time periods.

I think with algorithmic trading they're always going to have widely varying resource usages throughout the day.

Dan Dunn

I've got a bit more information about the problems that this thread is covering.

It's not particularly intuitive, but some of the biggest load times for us are in the morning when we're warming algorithms up. During the actual trading hours it's more smooth. We have done a lot of optimizing work to get the warmup better, but we started running into a database problem on history calls. We're reasonably sure that is what caused the contest algo problems.

We think we turned the corner this morning with significantly improved performance. We've got several other pending changes to give it more buffer, and we're monitoring it very closely.

It's worth noting that we have different tiers of our response depending on what is affected. The highest tier is algorithms trading real money. For obvious reasons, they get drop-everything-and-fix-it type treatment for important issues. Real-money algos were almost totally unaffected by this problem, and the affected customer got a personal email when it happened. Contest algorithm problems are the next-most important,. They get a lot of attention, but in this case they didn't get enough. We should have been better, earlier, at responding and explaining the problem as we were working on it.

I'm sorry that this has caused the problem that it did. We've learned a lesson about the contest algorithms in particular and how we can be better at communicating about them.

Disclaimer

Thank you Dan. I guess you must be all quite busy these days at the Quantipian HQ, but yes, I think acknowledging a problem as soon as possible is always a good communication idea ;) I know it was affecting only a few members, and I'm sorry if I looked a little frustrated in this public thread, but it seemed the problem was not taken very seriously until now. I'm looking forward to see the new improvements, and hope the system can accept as much new contest entries as possible. For me, and I would guess many others, the contest is a great way to learn about the market, the available tools, and build confidence in your platform before perhaps eventually step in real-money trading.

And thank you Andrea for going public about it. I'm sure it helped speeding up the fixing process.

Hi Dan,

Is there a common (central) database, or does each user or algo get their own version to run in their own sandbox virtual server? I realize that it may start to touch on some of your proprietary system architecture details, but it would be interesting to have a better picture of what goes on under the Quantopian hood.

Grant

Jonathan Kamens

I realize that it may start to touch on some of your proprietary system architecture details

Yes, it does.

And it doesn't make any difference or have any impact on how algorithms are implemented in Quantopian.

And it may (indeed, probably will) change over time.

Disclaimer