Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
New Feature: Batch Transforms!

This week we're rolling out a new feature: batch transforms. One of the most frequent requests from our members has been for more flexibility with trailing time frames. For example, our built-in moving average transforms are pretty nifty, but what if you want to use a 3-day moving average on some operations and a 5-day moving average on others? Or what if you want your volume-weighted average to use a volume-based window instead of a date-based window? The answer is to use batch transforms.

Think of a batch transform as a trailing window of data. You can define how many days are in the trailing window. You can define how often the window is refreshed. Obviously, the bigger and more frequent, the slower the backtest performance will be. The trailing window has price data and volume data. Every time the batch is refreshed, it also runs any calculations that you have requested. Least squares calculations, aggregations, averages, etc. are all possible.

Imagine an algorithm where every month you re-calculate the signals, and then trade on those signals for a month, and then recalculate again, and so on. Batch transform is built for that.

You can read a bit more documentation in our updated help doc.

To get started, you can also clone the example below. In this example, the refresh period is one day. The trailing window is 10 days. So, every day, the last 10 days of data is loaded. The batch transform finds the max price and min price in that window. The algorithm uses those as trading signals - it goes long when the price hits a min, and it goes short when the the price hits a max.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

8 responses

Hello Dan,

I decided to give this a try:

def initialize(context):  
    context.stock = sid(16841)

def handle_data(context, data):  
    prices = get_price_history(data)  
    log.debug(prices)

@batch_transform(refresh_period=5, window_length=30)  
def get_price_history(datapanel):  
    prices = datapanel['price']  
    return prices  

In the log output, eventually (after a bunch of "None" returns) I get:

2012-02-15handle_data:10DEBUG 16841 2012-01-03 00:00:00+00:00 179.03 2012-01-04 00:00:00+00:00 177.50 2012-01-05 00:00:00+00:00 177.63 2012-01-06 00:00:00+00:00 182.67 2012-01-09 00:00:00+00:00 178.57 2012-01-10 00:00:00+00:00 179.34 2012-01-11 00:00:00+00:00 178.88 2012-01-12 00:00:00+00:00 175.93 2012-01-13 00:00:00+00:00 178.51 2012-01-17 00:00:00+00:00 181.63 2012-01-18 00:00:00+00:00 189.38 2012-01-19 00:00:00+00:00 194.44 2012-01-20 00:00:00+00:00 190.93 2012-01-23 00:00:00+00:00 186.15 2012-01-24 00:00:00+00:00 187.00 2012-01-25 00:00:00+00:00 187.77 2012-01-26 00:00:00+00:00 193.29 2012-01-27 00:00:00+00:00 195.37 2012-01-30 00:00:00+00:00 192.15 2012-01-31 00:00:00+00:00 194.51 2012-02-01 00:00:00+00:00 179.46 2012-02-02 00:00:00+00:00 181.72 2012-02-03 00:00:00+00:00 187.68 2012-02-06 00:00:00+00:00 183.09 2012-02-07 00:00:00+00:00 184.19 2012-02-08 00:00:00+00:00 185.50 2012-02-09 00:00:00+00:00 184.98 2012-02-10 00:00:00+00:00 185.63 2012-02-13 00:00:00+00:00 191.59 2012...

So, it appears that potentially we have access to (datetime, price) pairs and their corresponding (datetime, volume) pairs. How can we get at the datetime field?

Also, perhaps somebody could provide a code example that does a time series analysis on a trailing window of price/volume data? The minmax function utilized in Dan's example above does not actually explicitly operate on (datetime, price) since it just finds the min and max over a trailing window. An example of a time series analysis would be a polynomial curve fit of price versus time.

Hi Grant,

That's correct. The pandas dataframe/datapanel is extremely powerful. For a much more exhaustive list of it's features see the pandas docs. Specifically the part on datapanel indexing (a datapanel is a 3d/stacked dataframe).

You can access the datetime via the index, e.g. datapanel['price'].index. That you can then use together with e.g. an OLS.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Thomas,

I'll give it a try. By the way, if it ain't obvious by now, simple plotting capability would be handy. Kinda scary to be doing time series analysis without having a look at the time series...

Grant

Hello Dan & Thomas,

I'm kinda confused by the datapanel thingy (granted, I have not yet read any Python docs. as Thomas recommended). What data variable type/container/object does the function below return? Seems like if I ask for prices, I should get list of prices (in an array/vector). But I get the prices along with their datetime stamps, separated by spaces.

@batch_transform(refresh_period=5, window_length=30)  
def get_price_history(datapanel):  
    prices = datapanel['price']  
    return prices  

Hi @Grant,

The datapanel is a pandas Panel object, which is sort of like a dictionary of dataframes. The dataframe is a two dimensional structure, with row index and column index. In our use, the row index is always datetime, and the columns are always sids.

Because each cell in the dataframe can only have one value (i.e. price, or volume, but not both), we build a three dimensional datapanel that has a dataframe for prices, and a dataframe for volume.

You can imagine the future where there are other dataframes in the panel - news sentiment, earnings estimates, really anything that is company specific.

I hope that helps!

thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Fawce,

When I get the chance, I'll look into the documentation. In the meantime, a simple tutorial on how to extract and analyze the data would be helpful. How can we do time series analyses, for example? Dan provides the .min() and .max() functions above, as an example. Generally, what functions can be applied? A few simple examples would be helpful.

Also, if I understand correctly, since the row index of a dataframe corresponds to datetime, then there are cases for which some cells (correct term?) will be empty in the row (i.e. versus sid). If a sid does not trade in a given historical minute, what is the value of the corresponding cell in the dataframe, None?

Also, in the decorator, can refresh_period and window_length be variables? For example (with r_p and w_l changing as the algorithm is run):

@batch_transform(refresh_period=r_p, window_length=w_l)  

Grant, you're right, we need more examples on this one. I'll work on it. I need to learn more about it myself. . . I will work up some more tutorial/examples on batch transforms.

There's one more example readily available, and that's from Zipline. The example code is here. This is the batch transform part of it:

@batch_transform(refresh_period=10, window_length=10)  
def ols_transform(data, sid1, sid2):  
    """Computes regression coefficient (slope and intercept)  
    via Ordinary Least Squares between two SIDs.  
    """

    p0 = data.price[sid1]  
    p1 = sm.add_constant(data.price[sid2])  
    slope, intercept = sm.OLS(p0, p1).fit().params

    return slope, intercept  

As far as the question "what functions can be applied" - the answer is super long - it can be almost anything.

You are correct that the datapanel sometimes has holes. We actually ran into a bug with that right after we launched! We implemented an interpolation method available in pandas. In the current implementation, the data is filled forward from the last known value where possible and the data is dropped where it is not. In the future, we plan to make this behavior configurable within your algorithm. That info will be added to the docs shortly.

Finally - can you have variables in the decorator? I know the decorator is only called once, so you certainly can't have dynamic variables there. I did run a quick test with global variables, and that appears to work.

Thanks Dan,

In the datapanel, it seems like you are gonna create a problem when there is missing data by interpolating. There should be an indication that the sid did not trade in a given minute (which for some algorithms could be useful information). Interpolating, etc. sounds like a bad idea. And what do you mean by "the data is dropped"?

I'll see if I can sort out the OLS example you provided...thanks.

Grant