SID limit of 10

Back to Community

SID limit of 10

posted

Since there is no sid-by-ticker function available, I cooked up a macro that nipped out sids for all symbols listed on NASDAQ-100, but when building this script I get the error:

"Algorithim must reference 1 to 10 SIDs, inclusive"

I was curious why this limit is there - as far as I can deduce, the sample which is passed to the handle_data function actually contains the full range of sids (it is not required to flag their usage in the initialize function), so I can only guess that this is a precaution against people writing algorithms that do too much heavy lifting.

Discovering the error, and second guessing why it appeared, I refrained from running any backtesting but I did discover that the error would go away by assigning the sid function to a different identifier - if the intention is to restrict the number of calls to this function, I suggest adding a runtime limitation as well, if that is not already there.

4 responses

Thomas Wiecki

We basically scan your code for your calls to the sid() function and extract the SIDs you want to use (I would not recommend overriding it). We then only send those you actually use to the handle_data() function. Do you have sid(INT) somewhere in your code? If you want to trade a portfolio, check out my DMA algorithm for how this can be done.

Streaming all SIDs would be extremely taxing on bandwidth and computation time. Having said that, we can certainly think about increasing the upper limit if there is demand for it.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Rune Braathen

I see, so the data dict only contains the SIDs that are explicitly referenced in code, so if I e.g. do:

someNums = [ 1, 2, 3, 4]

someSids = map(lambda x : sid(x), someNums)

then the test data.available(sid(1)) will be false in the context of the handle_data function.

This does in fact make perfect sense. If all one-minute sample data for all available SIDs were crammed into memory, then that would make streaming them all quite a bit easier. A modest estimate of 10 years of 1 minute samples for 10000 SIDs weighs in at 195 gigs, so it isn't impossible. Not sure whether it would solve any real problem, but the limit will go away - at the cost and uncomfort of an architechture-level rewrite, so it is probably not the best of ideas. :)

John Fawcett

Hi @Rune, thanks for the feedback.

We've considered holding all the data in memory, and your estimates line up with our own. The architectural issue you infer amounts to the fact that we run simulations in independent processes. So, loading all the data into memory doesn't get all that data to the child processes running the simulations. A large memory cache of trades may still make sense, especially if we find highly localized queries from members' algorithms. For example, if most algorithms want to screen by the 1000 largest cap stocks, caching those would be a mere 20Gig.

Another idea I've been mulling is to lazily load the data as the algorithm requests it. We actually want members to have the sensation that data holds the entire universe, which is why we pre-parse today. But the illusion is too shallow right now, and people like you want to provide a function or some other late breaking request to data. When they get a key-error it is very disconcerting. Maybe a mixture of all three (parsing, lazy loading, big mem cache) is a way forward.

Disclaimer

Rune Braathen

Since the set of trades are static data, some clever assumptions can probably be made about how to organize their storage. If that works well, then there is a single, simplistic uniform way to feed data to the users backtesting algorithms, and key-errors should only appear when accessing a SID that doesn't exist at a particular point in time. The amount of magickery required to provide this data to an arbitrary set of child processes is another cup of tea though, and its difficulty largely depends on how things are bolted together on the serverside.

Anyway, thank you very much for clarifying this. The pre-parsing does make sense - may be a good idea if documentation explicitly states that code is scanned for SID-references and that this step controls what symbols are accessible in the backtesting hook.

You've successfully submitted a support ticket.

Our support team will be in touch soon.