I am attempting to validate my stock factors to ensure they are calculating things correctly, but in the process I discovered that the data provided seems to be wildly inaccurate. Just one example of the many I found:
AAPL - Dec 18th
Quantopian: 105.841
Yahoo: 106.03
Google: 106.03
Market Watch: 106.03
Portfolio123: 106.03
In fact, nearly every single close price is off by at least $0.01, but many are off by far more. Nearly every Gain/Return calculation I perform on nearly every stock is off by anywhere from 0.3% to 3% (at least the ones I have manually checked so far). My calculations for Gain/Return over an entire industry are off be even wider margins, sometimes over 12% (as compared to Portfolio123 and several other online databases which report industry performance). All of the other sources were very consistent. While they were off by fractions of a percent up to about 1% from each other, industry performance over the same time period as calculated on Quantopian is off by catastrophic amounts. I will accept the fact that it might be partly due to missing data for various stocks. For example, some of my stocks come up with NaN when computing the industry performance (many don't seem to have an industry code available) so I am prepared for it to differ by a small margin, but having a stock in consumer discretionary telling me its industry dropped by around 12% over the past month while other sources say -25% is just ridiculous.
Now, I am prepared to accept that I may have a bug in my computations, so in the interest of transparency, here is the code:
import numpy as np
from numpy import ma
import pandas as pd
import talib as ta
from quantopian.pipeline.factors import SimpleMovingAverage
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.pipeline.data.builtin import USEquityPricing
import quantopian.pipeline.data.morningstar as ms
def GainPct(offset=0, nbars=2):
class GainPctFact(CustomFactor):
window_length = nbars + offset
inputs = [USEquityPricing.close]
def compute(self, today, assets, out, close):
num_bars, num_assets = close.shape
newest_bar_idx = (num_bars - 1) - offset
oldest_bar_idx = newest_bar_idx - (nbars - 1)
print close[:,2] # Dump AAPL close prices
out[:] = ((close[newest_bar_idx] - close[oldest_bar_idx]) / close[oldest_bar_idx]) * 100
return GainPctFact()
def GainPctInd(offset=0, nbars=2):
class GainPctIndFact(CustomFactor):
window_length = nbars + offset
inputs = [USEquityPricing.close, ms.asset_classification.morningstar_industry_code]
def compute(self, today, assets, out, close, industries):
num_bars, num_assets = close.shape
newest_bar_idx = (num_bars - 1) - offset
oldest_bar_idx = newest_bar_idx - (nbars - 1)
# Compute the gain percents for all stocks
asset_gainpct = ((close[newest_bar_idx] - close[oldest_bar_idx]) / close[oldest_bar_idx]) * 100
# For each industry, build a list of the per-stock gains over the given window
unique_ind = np.unique(industries[0,])
for industry in unique_ind:
ind_view = asset_gainpct[industries[0,] == industry]
ind_mean = np.nanmean(ind_view)
out[industries[0,] == industry] = ind_mean
return GainPctIndFact()
# The initialize function is the place to set your tradable universe and define any parameters.
def initialize(context):
pipe = Pipeline()
attach_pipeline(pipe, name='my_pipeline')
gainpct = GainPct(0, 20)
#gainpctind_off0 = GainPctInd()
#gainpctind_off1 = GainPctInd(1)
#gainpctind1wk = GainPctInd(0, 5)
gainpctind4wk = GainPctInd(0, 20)
#gainpctindprevwk = GainPctInd(5, 5)
#pipe.add(gainpctind_off0, name='gainpctind_off0')
#pipe.add(gainpctind_off1, name='gainpctind_off1')
#pipe.add(gainpctind1wk, name='gainpctind1wk')
pipe.add(gainpct, name='gainpct')
pipe.add(gainpctind4wk, name='gainpctind4wk')
#pipe.add(gainpctindprevwk, name='gainpctindprevwk')
def before_trading_start(context, data):
results = pipeline_output('my_pipeline')
print results.head(15)
update_universe(results.sort('gainpctind4wk').index[:10])
# The handle_data function is run every bar.
def handle_data(context,data):
# Record and plot the leverage of our portfolio over time.
record(leverage = context.account.leverage)
# We also want to monitor the number of long and short positions
# in our portfolio over time. This loop will check our positition sizes
# and add the count of longs and shorts to our plot.
longs = shorts = 0
for position in context.portfolio.positions.itervalues():
if position.amount > 0:
longs += 1
if position.amount < 0:
shorts += 1
record(long_count=longs, short_count=shorts)