technical questions regarding loading of data and ram limit on backtests.

Jamie mentions on https://www.quantopian.com/posts/master-thesis-ideas that the RAM limit for the research platform is 4 GB. If they are in Python, you could try pasting them into the IPython IDE.

Thanks Grant. I'm not familiar with the research platform but I'll look into it.

I haven't kept up with improvements to the research platform. It appears that csv files can be uploaded, and presumably used in analyses. Also, it looks like there is support for uploading .ipynb files. And text files can be created, too (not sure how they fit in...maybe just for taking notes?).

So i've found the Quantopian platform to be pretty frustrating in general for my applications. I have what seems to be working models when backtested on my machine but can't seem to find a way to load them onto the IDE.

What do you need to load? Data or code or both? What do you mean, specifically, by "10 -20 machine learning models that are about 35 Mb each"?

Grant. Good questions.

i have a model i wrote, call it M. M takes dataframe df and returns Mx. Mx is my trained machine learning model and i write it to disk as a pickle file. Normally when i need this model, i just load in the pickle file. Ideally, quantopian would just implement pickle.load into a fetcher.

the other option is that i fetch csv into df, paste in "def M(df):" and call Mx = M(df) online. Only problem with this one is that i have to a load a much smaller data set, and limit my fitting time. and apparently, fetchers also disqualify you from the competition (and also fund?).

Why don't you simply do your computations within Quantopian? Type your model M into the IDE and run it there. Or have you tried this? Or perhaps Quantopian doesn't have the right datasets? Guess I'm not following why you need to load anything if you could compute it on the Quantopian platform.

So I think it's possible, but very hacky and would probably not transfer when you switch to paper trade or live trading.

I can run a back test from 2002 to 2012, gathering data from the "history" fetcher as you pass through time. after 2012, I can use models built on data from the previous 10 years to execute on trades. So what happens when you click paper trade? I suspect the models appended to context (for example, context.Mx1), will disappear when the algorithm is restarted, since I don't think context.Mx1 will be persistent from backtest to paper test.

I can also build the models if i can fetch specific time ranges, say, Jan 2004 to Apr 2004, but from what i've read from the documentations, history does not support time range.

ps. hm. maybe i make a feature request.

Yeah, you'd have to be able to construct the model after the algo starts. There is ~ 5 minutes of compute time in before_trading_start (you could also run for up to ~50 secs every trading minute, for that matter, so you'd have up to 325 minutes per day going this route). Day-to-day, if you could store intermediate results in context then you could run your code over N days, with N*5 minutes of compute time using before_trading_start. The downside is that trading wouldn't start for N days.

What look-back window do you actually need on a rolling basis? It sounds like only 3 months, in practice? Daily bars or minutely? How many stocks?

Grant, My current approach is 5k stocks over n-days look-back.

I can do a 2 month look-back, but i've found that while it gives good recommendations, it is not sufficient to over-come commission and slippage. since i'm trading 100 stocks a day I pay on average 20% in commission and slippage per year--that's $200k on a $1M portfolio. Interestingly, the further I look back, the better my results (although it does drop off exponentially). When you are trading on a daily basis, that +0.0001 roi makes a difference.

I just realized I can probe it myself. Ram limit on the ide is about 1 Gb.

OK, nevermind. I've just tried different allocations and the memory error feels almost arbitrary.

update: the problem is with the way i'm probing ram. sometimes, errors are invoked and "print" is not realized before error is thrown. IDE appears to have about 2.25-2.5 Gb.

toan

def test_mem(context, data):
a = 1310
b = 100
x1 = np.random.random([a, b*2**10]) #about 1 Gb
sz = 1
while True:
sz *= 2
x = np.random.random([a, b*sz])
print(sz, "Mb")
pass

Mikko M

May 9, 2016

Regarding loading ML models, I have thought this a lot myself but didn't yet implement this - you could serialize the model to file (with pickle for example), [optionally] gzip it and base64 encode it. Then you'll copy/paste the base64 encoded string to source code and first decode base64 and then gunzip it at initialization and you'll have the model. This is as close to optimal solution as you can get with current Quantopian implementation.

Some examples of zipping/base64 part here:
http://stackoverflow.com/questions/10577385/putting-gzipped-data-into-a-script-as-a-string