Hi @Jean, @Josh, @Jamie,
[I started off writing this as a response to the latest comment in this thread, but then thought it might warrant a separate post.]
Thank you for the continued effort to improve effective use of limited resources in Research.
Here's my current experience in Research from a single user perspective:
- From a fresh start (killing all active notebooks (NB)), just listing my NBs in Research (I have quite a few) uses up 1% of my allocated memory (leading me to think I don't get much memory to start with).
- Opening up a single NB, without running any cells, uses up another 4-5% of memory. (see attached NB as an example). Now memory utilization is at 5-6% (without actually running anything).
- Having only the attached NB open, and trying to run 'all cells' from the top, results in memory maxing out, and kernel restarting, about midway through the NB.
- There's no way for me to tell if a particular cell will result in maxing out memory, other than just 'trial and error,' which can be quite time-consuming.
The attached NB is only slightly modified from Thomas W's excellent NB. Yes, it runs over 10 years instead of 7, and uses 63 day SMA instead of 5 days, but it also only uses a single field from Morningstar, rather than something from a Premium Dataset, which I believe uses more resources.
Maybe I have unrealistic expectations, but since Q offers data back to 2002, I would have expected to be able to research this data in full, using a reasonably complex factor, including reasonable 'smoothing' etc., without running into the memory limit?
As I see it, the only option I have is to use 'trial and error' to find out how long a period I'm able to perform research on for a specific factor (period length may depend heavily on factor complexity), which can be quite time-consuming and in my view is very unfortunate. Or maybe I need to learn how to write more 'memory effective' code (admittedly I'm not the best python programmer)?
In your capacity planning, how was the current memory limit determined? Does memory 'scale-up' as more users are starting to use the Q platform, or do we each get allocated a smaller piece of the same memory-pie, as more users are added? Is memory allocated per user 'fixed' or 'dynamically' allocated depending on current/projected usage during the day?
I would be quite happy to pay a reasonable 'subscription fee' for a 'premium Q service' if it meant I could have more memory to do more effective research. I'm just a single person, but if Quantopian Enterprise (QE) offers more memory in Research at a reasonable price, I might look into that. The QE FAQ page doesn't mention anything about more memory however, and I actually prefer to not have access to the FactSet holdback period (so I can test for overfitting in the Contest), so I don't really want to pay for access to the 'holdback period.'
If you or anyone else have any ideas or suggestions on how I can make better use of memory in Research, I'd be all ears. E.g. maybe there's a time of day when I'm more likely to be allocated more memory?
Thank you for an otherwise excellent platform, and for your continued effort to improve the Research experience!
Joakim
PS: I did see the Alphalens 'pull request' in Github to Speed up compute_forward_returns
and get_clean_factor
which is great, and might help with memory utilization as well?