Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Data variable

So when I create function that references Data I need to pass Data to the function, i.e. it is not global. There appears to be no conflict in calling the variable I pass to the function by the same name, "Data". Now, the Data variable is a Datapanel, is it not?

I ask because I noticed something bizarre. So I created a function that performs a batch_transform and I pass the Data variable to it, calling it Data in the function as well. Everything seemed to work until I tried to do a "Prices.mean" transform - the program says it does not recognize that transform.

So I tried playing with the function in the API Docs.

The original function runs without issue:

@batch_transform(window_length=10)  
def get_averages(datapanel):  
  # get the dataframe of prices  
  prices = datapanel['price']

  # return a dataframe with one row showing the averages for each stock.  
  return prices.mean()  

But if I replace 'datapanel' with 'data' I get the same error, mean transform not recognized:

@batch_transform(window_length=10)  
def get_averages(data):  
  # get the dataframe of prices  
  prices = data['price']

  # return a dataframe with one row showing the averages for each stock.  
  return prices.mean()


So I tried a total made up name, yabbazabba, for the data variable and that works:

@batch_transform(window_length=10)  
def get_averages(yabbazabba):  
  # get the dataframe of prices  
  prices = yabbazabba['price']

  # return a dataframe with one row showing the averages for each stock.  
  return prices.mean()  

** So the question: it appears that there is something special about the name 'data' for a variable and it is essentially reserved. Is this true? It seems to be the default to pass 'data' by the name 'data' to other functions, except in the case where we are doing a batch_transform where the default seems to be datapanel? At least that seems to be the tendency in the API Docs. I guess I'm just confused as to the cause of this particular error. If data is not global, why should it be different than using yabbazabba? **

Obviously this problem is easy to work around but it has me curious as to what is going on...

6 responses

Hi Daniel,

You've definitely identified a bug. There's something wrong with our algorithm validation functionality (note that the error you're getting occurs at build time, when we do validation, not at run time) that we need to fix. I've opened an issue about this in our bug database. In the meantime, yes, the correct workaround is to just use a different name for the variable.

Jonathan Kamens

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

No worries. Thanks Jonathan.

On a related note - I noticed that when I print prices.mean to the log file I am observing that the "mean" transform requires (0) to specify averaging across rows (averaging across a given security) as opposed to columns. So, just to be sure, the structure of the data variable in the program is each row of data is a ticker and each column is a date and not the other way around? (or I should say the first index is the ticker symbol, the second index is the date and I suppose the third index is the property type, e.g. price, volume, etc.?)

For example (if the universe is only ticker SPY):

@batch_transform(window_length=5, refresh_period=1)  
def ranking(dataP, context):  
    pricesDaily=dataP['price']  
    print("Prices:")  
    print(pricesDaily)  
    print("")

    means=pricesDaily.mean  
    print("Means:")  
    print(means)  
    return(0)  

and output is
2013-04-11PRINTPrices:
2013-04-11PRINT 8554 2013-04-08 00:00:00+00:00 156.190 2013-04-09 00:00:00+00:00 156.740 2013-04-10 00:00:00+00:00 158.700 2013-04-11 00:00:00+00:00 159.210 2013-04-12 00:00:00+00:00 158.798
2013-04-11PRINT
2013-04-11PRINTMeans:
2013-04-11PRINT<
bound method DataFrame.mean of 8554 2013-04-08 00:00:00+00:00 156.190 2013-04-09 00:00:00+00:00 156.740 2013-04-10 00:00:00+00:00 158.700 2013-04-11 00:00:00+00:00 159.210 2013-04-12 00:00:00+00:00 158.798>

whereas if I amend means=pricesDaily.mean to means=pricesDaily.mean(0)

2013-04-11PRINTPrices:
2013-04-11PRINT 8554 2013-04-08 00:00:00+00:00 156.190 2013-04-09 00:00:00+00:00 156.740 2013-04-10 00:00:00+00:00 158.700 2013-04-11 00:00:00+00:00 159.210 2013-04-12 00:00:00+00:00 158.798
2013-04-11PRINT
2013-04-11PRINTMeans:
2013-04-11PRINT8554 157.9276

And last question (sorry for all these questions), does the data variable implicitly read the context variable in the initialize function? In other words, the securities included in the backtest seem to be defined by equating something within context to a series of calls to sid().

Thanks again for all the help.

I'm going to leave your question about the shape of the data in the datapanel to Thomas to answer, because that's much more in his wheelhouse than mine.

With regards to your other question, yes, when we compile your algorithm we look for either set_universe calls or sid() calls and use them to determine which stocks' data should be fed to handle_data and batch transforms. Incidentally, that's why you can't use variables in sid() calls -- you have to specify an integer there.

Awesome - I'm really starting to understand (and really like) this program. Thanks again.

Hi Daniel,

The dimension names of a Panel (a 3d DataFrame) is as follows (from the Pandas docs):
- items: axis 0, each item corresponds to a DataFrame contained inside
- major_axis: axis 1, it is the index (rows) of each of the DataFrames
- minor_axis: axis 2, it is the columns of each of the DataFrames

The Panel that the batch_transform creates is as follows:
- items : fields (e.g. price, volume)
- major_axis : datetime
- minor_axis : sids

Thus, to get the mean for every sid for every field, you'll want to call panel.mean(axis=1). This will return a DataFrame (2D) with the field_names as columns and the sids and index:

    close_price high low open_price price volume  
24    177.252 183.712 172.726 180.344 177.252 57760335.4  
17526  32.778 33.024 32.036 32.416 32.778 14194.8  

I'm attaching a backtest that illustrates this.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Thomas - That is exactly what I thought.