Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Data Inaccuracies

I am attempting to validate my stock factors to ensure they are calculating things correctly, but in the process I discovered that the data provided seems to be wildly inaccurate. Just one example of the many I found:

AAPL - Dec 18th

Quantopian: 105.841
Yahoo: 106.03
Google: 106.03
Market Watch: 106.03
Portfolio123: 106.03

In fact, nearly every single close price is off by at least $0.01, but many are off by far more. Nearly every Gain/Return calculation I perform on nearly every stock is off by anywhere from 0.3% to 3% (at least the ones I have manually checked so far). My calculations for Gain/Return over an entire industry are off be even wider margins, sometimes over 12% (as compared to Portfolio123 and several other online databases which report industry performance). All of the other sources were very consistent. While they were off by fractions of a percent up to about 1% from each other, industry performance over the same time period as calculated on Quantopian is off by catastrophic amounts. I will accept the fact that it might be partly due to missing data for various stocks. For example, some of my stocks come up with NaN when computing the industry performance (many don't seem to have an industry code available) so I am prepared for it to differ by a small margin, but having a stock in consumer discretionary telling me its industry dropped by around 12% over the past month while other sources say -25% is just ridiculous.

Now, I am prepared to accept that I may have a bug in my computations, so in the interest of transparency, here is the code:

import numpy as np  
from numpy import ma  
import pandas as pd  
import talib as ta  
from quantopian.pipeline.factors import SimpleMovingAverage  
from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline import Pipeline, CustomFactor  
from quantopian.pipeline.data.builtin import USEquityPricing  
import quantopian.pipeline.data.morningstar as ms

def GainPct(offset=0, nbars=2):  
    class GainPctFact(CustomFactor):  
        window_length = nbars + offset  
        inputs = [USEquityPricing.close]  
        def compute(self, today, assets, out, close):  
            num_bars, num_assets = close.shape  
            newest_bar_idx = (num_bars - 1) - offset  
            oldest_bar_idx = newest_bar_idx - (nbars - 1)  
            print close[:,2] # Dump AAPL close prices  
            out[:] = ((close[newest_bar_idx] - close[oldest_bar_idx]) / close[oldest_bar_idx]) * 100  
    return GainPctFact()

def GainPctInd(offset=0, nbars=2):  
    class GainPctIndFact(CustomFactor):  
        window_length = nbars + offset  
        inputs = [USEquityPricing.close, ms.asset_classification.morningstar_industry_code]  
        def compute(self, today, assets, out, close, industries):  
            num_bars, num_assets = close.shape  
            newest_bar_idx = (num_bars - 1) - offset  
            oldest_bar_idx = newest_bar_idx - (nbars - 1)

            # Compute the gain percents for all stocks  
            asset_gainpct = ((close[newest_bar_idx] - close[oldest_bar_idx]) / close[oldest_bar_idx]) * 100

            # For each industry, build a list of the per-stock gains over the given window  
            unique_ind = np.unique(industries[0,])  
            for industry in unique_ind:  
                ind_view = asset_gainpct[industries[0,] == industry]  
                ind_mean = np.nanmean(ind_view)  
                out[industries[0,] == industry] = ind_mean  
    return GainPctIndFact()


# The initialize function is the place to set your tradable universe and define any parameters.  
def initialize(context):  
    pipe = Pipeline()  
    attach_pipeline(pipe, name='my_pipeline')

    gainpct = GainPct(0, 20)  
    #gainpctind_off0 = GainPctInd()  
    #gainpctind_off1 = GainPctInd(1)  
    #gainpctind1wk = GainPctInd(0, 5)  
    gainpctind4wk = GainPctInd(0, 20)  
    #gainpctindprevwk = GainPctInd(5, 5)  
    #pipe.add(gainpctind_off0, name='gainpctind_off0')  
    #pipe.add(gainpctind_off1, name='gainpctind_off1')  
    #pipe.add(gainpctind1wk, name='gainpctind1wk')  
    pipe.add(gainpct, name='gainpct')  
    pipe.add(gainpctind4wk, name='gainpctind4wk')  
    #pipe.add(gainpctindprevwk, name='gainpctindprevwk')  

def before_trading_start(context, data):  
    results = pipeline_output('my_pipeline')  
    print results.head(15)  
    update_universe(results.sort('gainpctind4wk').index[:10])  

# The handle_data function is run every bar.  
def handle_data(context,data):  
    # Record and plot the leverage of our portfolio over time.  
    record(leverage = context.account.leverage)

    # We also want to monitor the number of long and short positions  
    # in our portfolio over time. This loop will check our positition sizes  
    # and add the count of longs and shorts to our plot.  
    longs = shorts = 0  
    for position in context.portfolio.positions.itervalues():  
        if position.amount > 0:  
            longs += 1  
        if position.amount < 0:  
            shorts += 1  
    record(long_count=longs, short_count=shorts)  
3 responses

Hey there, we get this question a lot. It's answered in the FAQ here.

For more discussion check out these threads:
- https://www.quantopian.com/posts/bug-in-live-trader-wrong-closing-price-statistics
- https://www.quantopian.com/posts/wrong-values-of-spy-returned
- https://www.quantopian.com/posts/inaccurate-prices
- https://www.quantopian.com/posts/problems-with-data-feeds-prices
- https://www.quantopian.com/posts/spy-closes-yahoo-v-quantopian-history

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I appreciate the response, and after reviewing everything you linked me to, I am a bit more comfortable with the stock close prices. However, I am still concerned about the all the NaN values for industry code in the Morningstar data. I know you can't control their data, but I'm wondering if maybe there is a more complete source out there for a lot of the Morningstar data.

Hi Eliot,

I dug into the Morningstar data. For the most recent date, I found 123 securities in our database with no value for Morningstar industry code. Everyone of them had a high number for their sid value. My assumption is that these 123 securities haven't yet been assigned a value for the industry code by Morningstar as they are more recent additions to the market. The assignment is likely a human executed task and as such they probably do it on some frequency slower than daily.

Hope that helps,
Josh

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.