Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
EMA calculations on Quantopian are different than those on other trading platforms. It could be a bug.

Hi,

I found out about 2 weeks ago that EMA calculations on Quantopian are markedly different than those on other trading platforms. At first, I thought it was because I am new to Quantopian. I might have missed something in my code. So, I checked my code again, but my code looks right. Then, I thought it could be talib. I ported MultiCharts EMA to Quantopian to see if that was the case. talib.EMA and the ported MultiCharts EMA calculations do match. So, there is nothing wrong with talib after all. I compared Quantopian data with Yahoo EOD data that I use on MultiCharts. They are both very similar. At this point, I have reasons to believe that there could be a bug in Quantopian backtesting engine or calculation.

Here is the talib.EMA code:

"""
Quantopian EMA calculation is markedly different from other EMA implementations (MultiCharts, TradingView, OptionsXpress, Yahoo, etc.).  
"""
import talib  
def initialize(context):  
    context.bar_interval = '1d'  
    context.bar_count = 120  
    context.slow_ema_length = 22  
    context.fast_ema_length = 12  
    context.spy = sid(8554)  
def before_trading_start(context, data):  
    test_function(context, data)

def test_function(context, data):  
    slow_ema = talib.EMA(data.history(context.spy, 'high', context.bar_count, context.bar_interval), timeperiod = context.slow_ema_length)  
    fast_ema = talib.EMA(data.history(context.spy, 'high', context.bar_count, context.bar_interval), timeperiod = context.fast_ema_length)  
    trend_is_down = []  
    i = 0  
    while i < context.bar_count:  
        if fast_ema[i] < slow_ema[i]:  
            trend_is_down.append(True)  
        else:  
            trend_is_down.append(False)  
        i += 1  
    print slow_ema[context.bar_count - 10:]  
    print fast_ema[context.bar_count - 10:]  
    print trend_is_down[context.bar_count - 10:]  

Here is the ported MultiCharts EMA code:

"""
MultiCharts EMA calculation test.  
"""
def initialize(context):  
    context.bar_interval = '1d'  
    context.bar_count = 120  
    context.slow_ema_length = 22  
    context.fast_ema_length = 12  
    context.spy = sid(8554)  
def before_trading_start(context, data):  
    test_function(context, data)

def test_function(context, data):  
    slow_ema = EMA(data.history(context.spy, 'high', context.bar_count, context.bar_interval), context.slow_ema_length)  
    fast_ema = EMA(data.history(context.spy, 'high', context.bar_count, context.bar_interval), context.fast_ema_length)  
    trend_is_down = []  
    for i in range(context.bar_count):  
        if fast_ema[i] < slow_ema[i]:  
            trend_is_down.append(True)  
        else:  
            trend_is_down.append(False)  
    print slow_ema[context.bar_count - 10:]  
    print fast_ema[context.bar_count - 10:]  
    print trend_is_down[context.bar_count - 10:]

def EMA(price, length):  
    """  
    MultiCharts EMA implementation.  
    """  
    var0 = 2.0 / (length + 1)  
    ema = 0.0  
    ema_list = []  
    for i in range(len(price)):  
        if i == 0:  
            ema = price[i]  
            ema_list.append(ema)  
        else:  
            ema = ema + var0 * (price[i] - ema)  
            ema_list.append(ema)  
    return ema_list

If you build and run them on 2016-10-07 to 2016-10-10, they should have produced lists consisting all True values like other trading or charting platforms (MultiCharts, TradingView, OptionsXpress, Yahoo, etc.). But, that is not the case. You can change 'high' to 'close', it does not matter at all.

Can the Quantopian community validate my findings? I would also appreciate if you could check the following talib indicators: PLUS_DI, MINUS_DI, SAR and BBANDS calculations on Quantopian and compare them on other trading platforms as well. If something as simple as EMA is different, other more complex indicators can be different as well. If you think the difference is caused by Quantopian's fixed trailing data, I can tell you that MultiCharts uses the same fixed trailing data. This post clearly explains that given enough fixed trailing data, the difference in trailing data will not matter at all.

There are also other posts that tried to highlight this issue using other indicators: this and this. I think it is best to address this issue right now rather than later. This could be a big bug not yet discovered. I hate to see this issue blows up big time in the future. Can Quantopian validate my findings as well?

Thanks,

Hengki

12 responses

Hengki.

EMA in terms of signal processing is IIR (Infinite Impulse Response Filter).
Its value depends on starting value of filtered data.
If starting points are the same the results will be the same , if not then not.
Over time EMA will mature.
To get more stable results you need calculate EMA on 5 times more data set then yours slow_ema_length, the more the better.
See the difference between attached backtests:
bar_count_2 = 22

bar_count_2 = 126

Hi Vladimir,

Thanks for your research to show that with enough trailing data of at least 5 times of slow EMA length, the results will be more stable. I use 120 bars of trailing data in my sample code, more than 5 times of slow EMA length (22).

Hengki,

I have tested yours EMA_MC code and did not find significant difference
to TA_EMA

Date High EMA_TA EMA_MC
10/7/2016 216.21 216.1951839 216.1952302
10/10/2016 216.34 216.2086655 216.208698
10/11/2016 215.54 216.1957054 216.1957353
10/12/2016 213.59 215.9734651 215.9734993
10/13/2016 212.22 215.7010775 215.7011015
10/14/2016 214.155 215.6819387 215.6819529
10/17/2016 213.16 215.5035167 215.5035256
10/18/2016 214.19 215.4122777 215.412287
10/19/2016 214.035 215.2953967 215.2954111
10/20/2016 213.79 215.2208939 215.2209043
10/21/2016 213.08 215.0847575 215.0847451
10/24/2016 215.04 215.1663506 215.166321

The difference exist only in 4th digit and may not be the reason for false signals.

The discrepancy with Multichart EMA signals may be coming from Quantopian data itself witch significantly differ from Yahoo historical data, not adjusted for splits and dividends as most of us used to.

Hi Vladimir,

Thanks for confirming that talib.EMA and MC EMA calculations on Quantopian are pretty much similar and the differences should not be affecting any robust algo. I stated that fact in my first post. The issue here is that EMA calculations on other trading or charting platforms such as MultiCharts, tradingview.com (data provided by interactivedata.com), OptionsXpress (data provided by esignal.com), esignal.com, etc. are markedly different than EMA calculations on Quantopian. The data services that I mentioned, except Yahoo, are all split and dividend adjusted.

Those other trading or charting platforms show that fast EMA (12) is below slow EMA (22) for SPY ETF in October 2016 (up to October 24, 2016). You can use high or close prices, it does not matter at all. That is not the case on Quantopian. I wish I had access to CQG or Bloomberg Terminal to show the EMA calculations on these expensive data services. If any of Quantopian community member has access to CQG or Bloomberg Terminal, can you validate my findings? I only have access to esignal.com, Yahoo and tradingview.com.

Hengki,

As I mention above:
The discrepancy with Multichart EMA signals may be coming from Quantopian data itself witch significantly differ from Yahoo historical data, not adjusted for splits and dividends as most of us used to.

Did you compare Quantopian data to other data providers you have access (esignal.com, tradingview.com.)?

For the sake of clarity:

  1. data.history() and pipeline are split- and dividend-adjusted as of the simulation date. There is a great explanation of our price adjustments here.
  2. Our open/close/high/low/volume data does differ from the data in Yahoo other sources for other reasons. You can read about data sources in our FAQ.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Dan,

data.history() and pipeline are split- and dividend-adjusted as of the simulation date.

The key word, as you explain to me some time ago, is as of the simulation date.
That mean that all prices before split or dividend are as traded and so not adjusted.
With that kind of "adjustment" algorithm can not do meaningful EMA or other technical calculations using the values in the full window.

Vladimir, I know we've struggled to explain this concept to you in the past. I'll try explaining it again, with a different approach.

  1. Every day Hengki's code calls test_function(context, data)
  2. test_function(context, data) calls data.history() twice - I'll focus on the call where it requests a 22-day window for context.slow_ema_length
  3. The history data comes back as 22 data points, one for each of the most-recent 22 days.
  4. Each of those 22 data points is split- and dividend-adjusted as of the date being simulated

Within that example, as a thought experiment, imagine that there was a 2:1 stock split 11 days ago. Imagine that the company is incredibly stable, and the value of the company didn't change over those 22 days. In that case, the history data returned will be 22 equal values. The as-traded price 22 days ago, before the split, was twice as high as it was today; but the split-adjusted price is equal to today's price.

The EMA calculation will then, using those split-adjusted values, be able to calculate the day-to-day returns of the stock.

Hi Vladimir and Dan,

Yes, I did compare esignal, tradingview and other brokerages' data with Quantopian data. Here's how I did it:

  1. I launched esignal, tradingview and other brokerages' charting platforms and lined them up side by side.
  2. I pulled SPY data on each charting platform and set the trailing data to 120 bars (or 6 months). Split and dividend adjusted.
  3. I added fast EMA (12) and slow EMA (22) on each SPY chart.
  4. They all showed that fast EMA is below slow EMA in October 2016. Whether I used high or close prices, they all showed the same results.
  5. The sampe code in my first post do not produce all True values (fast EMA is below slow EMA) for SPY in October 2016 using Quantopian data.

Moving averages such as EMA are some of the best, simple and effective trading guards to defend your algos from bad data. If they cannot handle Quantopian data because it is so different than the rest of the data services, I do not know what to do anymore. Bad input --> bad output. Your algos will make decisions based on false information and will make false trading decisions. I hope Quantopian will fix this issue soon. Sometimes, it takes time depending on the severity of the issue. I understand that.

Hi Dan,

Thank you very much for yours deep explanation of the price adjustment process.
This time you reach your goal.
I may prove that despite ugly looking price plot during stock splits
they are not affect nether equity calculation nor signal generation.
The main reason of misleading was that price plot is not adjusted or as traded
but all other calculations are getting adjusted data.

PS
not adjusted for splits and dividends as most of us used to.
The key word is as most of us used to.
What about adding adj_price field to data.history witch may be plotted to see what is really feeding calculation?

VY

Well, this is a very interesting thread.....my experience mirrors that of the opening poster. After spending many hours on migrating my system over to Quantopian, I was stumped as to why I was getting abnormal results. Looking deeper, I realized that talib.EMA produces results significantly different than other platforms.....and its not the closing (adjusted or otherwise) prices.

Has there been a definitive answer for this ?