Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
mavg() function seem slow

I'm a little confused by this and it's causing some execution errors on one of the algorithms I'm working on (too much time spent in handle_data). The .mavg() method seems to take quite a while executing on the first day during it's first call on a security. Now I'm sure there's a lot of fancy buffering and data fetching/optimization going on in depths of Quantopian (Quantopia?) but this doesn't seem quite right. Should I take a different approach to getting these values for multiple securities? Is there a way to distribute this processing so I can get through the first bar successfully?

10 responses

Sorry, I just realized that the Backtest display in the forums doesn't display the logged data directly. What I was trying to show is this logged output:
2014-01-02 PRINT XLY: 39.1845059395s
2014-01-02 PRINT XLF: 0.000547885894775s
2014-01-02 PRINT XLK: 0.000494003295898s
2014-01-02 PRINT XLE: 0.000482082366943s
: :

The attached revision, using rolling_mean() rather than mavg(), is about 200x faster in my testing -- it takes less than one-fifth of a second to compute all the moving averages for all the securities.

Great, thanks for the solution/workaround Michael.

It would be nice if someone from Quantopian would take a look at this because it definitely seems like something undesirable is going on. Either that or I'm calling this in a manner that they didn't anticipate.

I think I can explain it. There are two improvements, the first is biggest. That mavg(200) is actually calling for the moving average of the minute bars - so that's more than 700,000 prices being loaded up (200*390*9). Michael is using daily data and that saves you a factor of 390.

The second improvement is that rather than 4 calls to history(), he calls it just once and computes the mean of the relevant part of the dataframe, as needed.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Ahah! Thanks Dan. That was actually the first thing I looked for in the documentation, which states:

mavg(days)
Moving average price for the given security for the given number of trailing days.

From that I concluded that it was daily data being used. At the very least I think that the increment of the data that is being used should be documented. I would, in addition, suggest that a new syntax being introduced as an improvement. I believe one that is similar to the history() function would actually suffice.

mavg(bar_count, frequency, field, ffill=True)  

Thoughts?

The documentation does need to be improved there. Somehow we lost the important detail that it is minutes in minute mode, and days in daily mode.

That's an interesting idea on the API change. I've been on the fence in general whether to double-down on "helper" functions like mavg or to let pandas do its magic.

in other words don't use mavg() with long windows in minute mode anymore? Then the function could just as well be binned.

I would suggest to use the pandas calculation of mavg, it's faster:

mavg_200 = data[stock].mavg(200)

# equivalent to

mavg_200 = history(200, '1d', 'price')[stock].mean()  
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

The pandas calculation being the second or first, Alisa?

The "pandas" version would be
mavg_200 = history(200, '1d', 'price')[stock].mean(), though that only calculates the mean of a slice, so that's not really a mavg, just a mean at the time. For a mavg, you would use rolling_mean from pandas. Depending on your needs, you can get by using the .mean() if you're purely considering for that day.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.rolling_mean.html