Optimization Issue: Too much time are spent in handle

I've heard rumors that if you store history data in your context, you might have more time to process in in the morning in before_trading_starts, but I've never tried it.

How would I go about doing that Simon?

Felix Neate

I don't know if Simon has a better solution - I haven't had any success implementing anything - my current one is which passed 8 labels across 500 stocks ok with a inefficient use of append which I haven't worked out how to get rid of yet

# Pre_test settings

DEFAULT_A = 1  
DEFAULT_B = 1  
DEFAULT_C = 0

# These set up the frame for the default tracking variables

BASE_LABELS = ["Var 1", "Var 2", "Var 3"]  
BASE_VALUES = [DEFAULT_A, DEFAULT_B, DEFAULT_C]  
BASE_FRAME = pd.DataFrame(columns=BASE_LABELS)

def initialize(context):  

    context.Tracking = BASE_FRAME

You can get rid of the defaults if you like and it is easy to append additional rows based on the base frame. You don't need to do this if you know how many stocks you will be using and you can use it in before_trading_starts - I haven't tried timing though

Here's some code to play with. The log output is:

2015-08-21 PRINT 59.299565
End of logs.

Try this:

    # if context.build:  
    #     context.build = False  
    #     return

The algo will fail the build, with a confusing build error:

--- Error Execution timeout.

It seems that the build allows for about 20 seconds max. dwell in before_trading_start(), so maybe the Q engineers are trying to build in some margin, since they can't guarantee how fast code will execute at any given time?

My hunch is that before_trading_start() allows 60 seconds of computation time (if you apply the trick to make it through the build), since if I tweak up the dwell time by setting context.iterations = 6*1000000000, I obtain:

TimeoutException: Call to before_trading_start timed out
There was a runtime error on line 18.

I recall asking about before_trading_start() and the allowed dwell, and I think that answer was consistent with my findings today. So, there is nothing gained by using before_trading_start(), unless you need results before 9:31 am when Q trading starts (in this case, as Simon points out, I think you just need to store the trailing window of OHLCV bars in context, although I'd have to test if you can capture the final bar of the prior day).

I'm wondering, though, when before_trading_start() runs in live trading? If it is at 8 am, then there's a lot of wasted time that could be used for computations.

Maybe the Q team can shed some light on their design choices for before_trading_start()?

import time

def initialize(context):  
    context.stocks = sid(24)  
    context.iterations = 5*1000000000  
    context.build = True  
def before_trading_start(context):  
    if context.build:  
        context.build = False  
        return  
    start = time.clock()  
    for k in xrange(context.iterations):  
        pass  
    elapsed = time.clock() - start  
    print elapsed  
def handle_data(context, data):  
    pass

Interesting approach... I guess I have to stick with that then... Anyone know if replicating this in the research environment allow unlimited time?

Generally, there are no time limits in the research environment that I'm aware of (although I wouldn't count on up times lasting days/weeks), but I'm not sure if the time-out issues encountered above carry over, since presumably you'd be using zipline there. You might contact Q support, before wasting time coding, only to find out you still have the time-out problem.

One hack, if your routine will support it, would be to spread the computation over many calls to handle_data(). For example, over a trading day, you could get a 390X enhancement, if it could be broken up.

Aug 25, 2015

Grant, the bottleneck is over one optimization call so I'm not too sure how that can be spread out.

One thing that I DID notice is that if you set a conditional breakpoint that can NEVER be hit, the backtest doesn't timeout in the preliminary 'build algorithm' stage. However, something that takes > the timeout with the conditional breakpoint can take hours to calculate... in which when i return to the computer, i always seem to get an unknown error with no debug message.

Aug 26, 2015

the bottleneck is over one optimization call so I'm not too sure how that can be spread out.

If your computation takes N steps that can be broken up into N/M steps, then you could spread the steps over multiple calls to handle_data(). Say I'm summing the integers 1 through 6 in a loop. In the first pass, I'd do 1+2+3 and in the second pass, I'd do 6 + 4 + 5 + 6. The result of the first computation would be stored in context (e.g. context.temp). As long as there are no memory leaks, you may be able to run indefinitely. And the approach should be compatible with live trading, since my understanding is that everything in context gets stored overnight and is available the next day.

The other angle is that if your code runs "30 seconds to up to two minutes" maybe you just need to make it more efficient. Python (like MATLAB) supports crazy fast vectorized computations.

My read is that if you need scalable computational "umph!" you should consider how to leverage the Quantopian research platform. The paradigm for the backtester is that with a few clicks, one should be able to roll right into paper or real-money trading. So, the backtester can be used for research, but it is more of a development and verification tool (and as such, has relevant time limits built in, so that under live trading the algo doesn't fail).

Aug 26, 2015

Grant, I am using the scipy minimize() func. I did time everything out where the optimization takes about 55 seconds at best but the objective function only takes 0.8 seconds to evaluate. Do you think I can spread out between multiple handle_data?

Also I agree with your statement of being a easily hackable platform but it phases out computationally heavy strategies that traders may implement over monthly or weekly periods. The paradigm shouldn't be "intra-day trading" where the most important factor could be execution time. If Q has the ability to support python and its various libraries, then it should prepare for more complex and computationally heavy strategies.