Data Inaccuracies

I am attempting to validate my stock factors to ensure they are calculating things correctly, but in the process I discovered that the data provided seems to be wildly inaccurate. Just one example of the many I found:

AAPL - Dec 18th

Quantopian: 105.841
Yahoo: 106.03
Google: 106.03
Market Watch: 106.03
Portfolio123: 106.03

In fact, nearly every single close price is off by at least $0.01, but many are off by far more. Nearly every Gain/Return calculation I perform on nearly every stock is off by anywhere from 0.3% to 3% (at least the ones I have manually checked so far). My calculations for Gain/Return over an entire industry are off be even wider margins, sometimes over 12% (as compared to Portfolio123 and several other online databases which report industry performance). All of the other sources were very consistent. While they were off by fractions of a percent up to about 1% from each other, industry performance over the same time period as calculated on Quantopian is off by catastrophic amounts. I will accept the fact that it might be partly due to missing data for various stocks. For example, some of my stocks come up with NaN when computing the industry performance (many don't seem to have an industry code available) so I am prepared for it to differ by a small margin, but having a stock in consumer discretionary telling me its industry dropped by around 12% over the past month while other sources say -25% is just ridiculous.

Now, I am prepared to accept that I may have a bug in my computations, so in the interest of transparency, here is the code:

import numpy as np  
from numpy import ma  
import pandas as pd  
import talib as ta  
from quantopian.pipeline.factors import SimpleMovingAverage  
from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline import Pipeline, CustomFactor  
from quantopian.pipeline.data.builtin import USEquityPricing  
import quantopian.pipeline.data.morningstar as ms

def GainPct(offset=0, nbars=2):  
    class GainPctFact(CustomFactor):  
        window_length = nbars + offset  
        inputs = [USEquityPricing.close]  
        def compute(self, today, assets, out, close):  
            num_bars, num_assets = close.shape  
            newest_bar_idx = (num_bars - 1) - offset  
            oldest_bar_idx = newest_bar_idx - (nbars - 1)  
            print close[:,2] # Dump AAPL close prices  
            out[:] = ((close[newest_bar_idx] - close[oldest_bar_idx]) / close[oldest_bar_idx]) * 100  
    return GainPctFact()

def GainPctInd(offset=0, nbars=2):  
    class GainPctIndFact(CustomFactor):  
        window_length = nbars + offset  
        inputs = [USEquityPricing.close, ms.asset_classification.morningstar_industry_code]  
        def compute(self, today, assets, out, close, industries):  
            num_bars, num_assets = close.shape  
            newest_bar_idx = (num_bars - 1) - offset  
            oldest_bar_idx = newest_bar_idx - (nbars - 1)

            # Compute the gain percents for all stocks  
            asset_gainpct = ((close[newest_bar_idx] - close[oldest_bar_idx]) / close[oldest_bar_idx]) * 100

            # For each industry, build a list of the per-stock gains over the given window  
            unique_ind = np.unique(industries[0,])  
            for industry in unique_ind:  
                ind_view = asset_gainpct[industries[0,] == industry]  
                ind_mean = np.nanmean(ind_view)  
                out[industries[0,] == industry] = ind_mean  
    return GainPctIndFact()


# The initialize function is the place to set your tradable universe and define any parameters.  
def initialize(context):  
    pipe = Pipeline()  
    attach_pipeline(pipe, name='my_pipeline')

    gainpct = GainPct(0, 20)  
    #gainpctind_off0 = GainPctInd()  
    #gainpctind_off1 = GainPctInd(1)  
    #gainpctind1wk = GainPctInd(0, 5)  
    gainpctind4wk = GainPctInd(0, 20)  
    #gainpctindprevwk = GainPctInd(5, 5)  
    #pipe.add(gainpctind_off0, name='gainpctind_off0')  
    #pipe.add(gainpctind_off1, name='gainpctind_off1')  
    #pipe.add(gainpctind1wk, name='gainpctind1wk')  
    pipe.add(gainpct, name='gainpct')  
    pipe.add(gainpctind4wk, name='gainpctind4wk')  
    #pipe.add(gainpctindprevwk, name='gainpctindprevwk')  

def before_trading_start(context, data):  
    results = pipeline_output('my_pipeline')  
    print results.head(15)  
    update_universe(results.sort('gainpctind4wk').index[:10])  

# The handle_data function is run every bar.  
def handle_data(context,data):  
    # Record and plot the leverage of our portfolio over time.  
    record(leverage = context.account.leverage)

    # We also want to monitor the number of long and short positions  
    # in our portfolio over time. This loop will check our positition sizes  
    # and add the count of longs and shorts to our plot.  
    longs = shorts = 0  
    for position in context.portfolio.positions.itervalues():  
        if position.amount > 0:  
            longs += 1  
        if position.amount < 0:  
            shorts += 1  
    record(long_count=longs, short_count=shorts)