bug w/ context.stocks (list vs. set) & batch transform?

Back to Community

posted Mar 7, 2013

Recently, I noted some odd behavior by the batch transform. I found that it apparently re-orders the sids if context.stocks is a list. The order is maintained if context.stocks is a set. The outputs for the respective cases are below:

context.stocks = [sid(351), sid(1419), sid(1787), sid(25317), sid(3321), sid(3951), sid(4922)]

2013-02-26handle_data:25DEBUG<type 'list'>  
2013-02-26handle_data:26DEBUG[ 2.4601 87.18 99.43 13.89 31.8 20.58 102.35 ]  
2013-02-26handle_data:27DEBUG[[ 13.89 87.18 20.58 31.8 102.35 99.43 2.4601]]

context.stocks = {sid(351), sid(1419), sid(1787), sid(25317), sid(3321), sid(3951), sid(4922)}

2013-02-26handle_data:25DEBUG<type 'set'>  
2013-02-26handle_data:26DEBUG[ 13.89 87.18 20.58 31.8 102.35 99.43 2.4601]  
2013-02-26handle_data:27DEBUG[[ 13.89 87.18 20.58 31.8 102.35 99.43 2.4601]]

Is this the expected behavior, or is it a bug? If it is not a bug, I suggest making this clear in your documentation and training, since it is a subtle difference that can result in erroneous output (as I learned).

The "Add Backtest" button is not available, so I am posting the code in-line:

import numpy as np

# globals for get_avg batch transform decorator  
R_P = 1  # refresh period  
W_L = 1  # window length

def initialize(context):  
    #context.stocks = [sid(351), sid(1419), sid(1787), sid(25317), sid(3321), sid(3951), sid(4922)]  
    context.stocks = {sid(351), sid(1419), sid(1787), sid(25317), sid(3321), sid(3951), sid(4922)}  
def handle_data(context, data):  
    prices = np.zeros(len(context.stocks))

    # get prices  
    for i, stock in enumerate(context.stocks):  
        prices[i] = data[stock].price  
    # get batch transform prices  
    if get_prices(data,context) == None:  
        log.debug("get_prices(data,context) == None")  
        return  
    prices_bt = get_prices(data,context)  
    log.debug(type(context.stocks))  
    log.debug(prices)  
    log.debug(prices_bt)

@batch_transform(refresh_period=R_P, window_length=W_L) # set globals R_P & W_L above  
def get_prices(datapanel,sids):  
    return datapanel['price'].values

6 responses

Grant Kiehne

Mar 7, 2013

Attached, please find the algorithm that I could not attach to the post above (no "Add Backtest" button). --Grant

Grant Kiehne

Mar 15, 2013

Hello all,

Any insights on this? I'd like to close it out.

Thanks,

Grant

John Fawcett

Mar 15, 2013

Hi Grant,

So sorry for the delay - Dan forwarded this to me when you first submitted it and I didn't get back to look at it until now. I've been working on a really fun new feature, and I let this slip - sorry!

The issue you have uncovered is by design. We don't guarantee any order on the data parameter to handle_data or to the dataframe columns in the datapanel sent to batch transform. Instead, we decided to go the route of using keys/column labels.

Thanks for pointing this out.

thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Grant Kiehne

Mar 16, 2013

Thanks Fawce,

No problem...just didn't want it to slip by. I'm still not quite clear on what's going on. My goal is to have a sure-fire way of capturing a trailing window of security data in a numpy ndarrary. For the code above, the batch transform re-orders the columns when the sids are defined in a Python list. But if I define the sids with a Python set, the batch transform maintains the sid order.

Some questions:

I am not clear from your response that the behavior highlighted by my code above will be consistent. If I call the batch transform with the sids defined by a Python set, will I always get back an ndarray with the columns ordered according to the sid order in def initialize(context)?
When a call the batch transform with a Python list of sids, the ndarray columns are re-ordered. What criteria are used by the batch transform to order the columns? Or is it random?
What would be the best practice? From your response, it sounds like I should be using keys explicitly when extracting data from the dataframe columns. Is there a way to re-write the code above so that I obtain the same result, regardless of the Python data type of context.stocks (list or set)?
When using set_universe, are the sids in a Python list or set?

Grant

John Fawcett

Mar 17, 2013

Hi Grant,

Thanks for detailing your thoughts. I'll jump right into your questions:
1. My read on the ordering is that the set is dumping the sids in order of their hashes, and that the datapanel['prices'].values is dumping the prices out in the order of the sid hashes. In other words, it is a coincidence of the implementation of the dataframe.values property and the implementation of set. Neither of those two objects should be considered ordered lists. The list, on the other hand, is ordered as you coded it. It should retain that sort until you modify the list.
2. It is deterministic, but based on the need to index into the columns, and so appears to be in order of the hash of the column key.
3. It really just depends on how you want to code your function. The key access is available on the dataframe, andif you want an ordered ndarray, pandas has a builtin function to produce a sorted ndarray. See the attached backtest for an example using the dataframe.as_matrix function.
4. The sids in the universe are available via the data parameter to handle_data - which is not sorted. You can iterate through the keys with:

for stock in data:  
   price = data[stock].price

thanks,
fawce

Disclaimer

Grant Kiehne

Mar 20, 2013

Thanks Fawce,

This does the trick:

@batch_transform(refresh_period=R_P, window_length=W_L) # set globals R_P & W_L above  
def get_prices(datapanel,sids):  
    return datapanel['price'].as_matrix(sids)[0]

Alternatively, for a window length greater than 1 (e.g. W_L = 5), I can use:

@batch_transform(refresh_period=R_P, window_length=W_L) # set globals R_P & W_L above  
def get_prices(datapanel,sids):  
    return datapanel['price'].as_matrix(sids)

Time increases with increasing row number (with the last row corresponding to the current tic).

The code you provided would seem to maintain the ordering explicitly.

Grant

You've successfully submitted a support ticket.

Our support team will be in touch soon.