Thanks Jamie -
My main concern at this point is the unpredictability. For one of my contest entries, I thought it would be interesting to run a Pyfolio tear sheet and include the out-of-sample data. The link is: https://www.quantopian.com/live_algorithms/592eb4761760420010d11150 .
I can no longer run even a 2-year backtest (which had been run automatically for me to enter the contest). In fact, I just tested the code, and it will only run for a few days before encountering a memory error. It is a speculation, but was something changed in preparation for your revamp of the way that fundamental data is accessed in Pipeline that could have impacted memory availability?
Perhaps you could provide a little tutorial on how Pipeline and the backtester in general handles memory. For example, for the code below, when I call
make_factors()
and create combined_alpha
does the memory required to pull in the trailing window of data and do the computation get tied up, such that if I scale to a very large number of factors I'd have a memory problem? Or does the Pipeline API free the memory, once the call to each custom factor is complete?
Also, when you revamp the way that fundamental data is accessed in Pipeline, will you still chunk the Pipeline computations? Or will the file system be fast enough that chunking will no longer be required (since you won't need to make slow calls to a remote database)? If you are reading from a local SSD, it should be pretty zippy. Or maybe you are loading up on RAM, and can put the entire file system there? How big is the fundamental database? Presumably chunking requires more memory, leaving less for the algo. Also, it can consume the 5-minute before_trading_start
computational window in an unrealistic way--there's lots of time available during live trading (I think), but due to backtest chunking, in practice, the 5-minute window is not fully available (and in some cases, can be largely consumed).
def make_factors():
class OptRev5d(CustomFactor):
inputs = [USEquityPricing.open,USEquityPricing.high,USEquityPricing.low,USEquityPricing.close]
window_length = 5
def compute(self, today, assets, out, open, high, low, close):
p = (open+high+low+close)/4
m = len(p)
a = np.zeros(m)
w = np.zeros(m)
for k in range(1,m+1):
(a,w) = get_weights(p[-k:,:],close[-1,:])
a += w*a
w += w
out[:] = preprocess(a/w)
class OptRev30d(CustomFactor):
inputs = [USEquityPricing.open,USEquityPricing.high,USEquityPricing.low,USEquityPricing.close]
window_length = 30
def compute(self, today, assets, out, open, high, low, close):
p = (open+high+low+close)/4
m = len(p)
a = np.zeros(m)
w = np.zeros(m)
for k in range(3,m+1):
(a,w) = get_weights(p[-k:,:],close[-1,:])
a += w*a
w += w
out[:] = preprocess(a/w)
class MessageSum(CustomFactor):
inputs = [stocktwits.bull_scored_messages, stocktwits.bear_scored_messages, stocktwits.total_scanned_messages]
window_length = 21
def compute(self, today, assets, out, bull, bear, total):
out[:] = preprocess(-(np.nansum(bull, axis=0)+np.nansum(bear, axis=0)))
class Volatility(CustomFactor):
inputs = [USEquityPricing.open,USEquityPricing.high,USEquityPricing.low,USEquityPricing.close]
window_length = 3*252
def compute(self, today, assets, out, open, high, low, close):
p = (open+high+low+close)/4
price = pd.DataFrame(data=p, columns=assets)
# Since we are going to rank largest is best we need to invert the sdev.
out[:] = preprocess(1 / np.log(price).diff().std())
class Yield(CustomFactor):
inputs = [morningstar.valuation_ratios.total_yield]
window_length = 1
def compute(self, today, assets, out, syield):
out[:] = preprocess(syield[-1])
class Momentum(CustomFactor):
inputs = [USEquityPricing.open, USEquityPricing.high, USEquityPricing.low, USEquityPricing.close]
window_length = 252
def compute(self, today, assets, out, open, high, low, close):
p = (open + high + low + close)/4
out[:] = preprocess(((p[-21] - p[-252])/p[-252] -
(p[-1] - p[-21])/p[-21]))
class Quality(CustomFactor):
inputs = [morningstar.income_statement.gross_profit, morningstar.balance_sheet.total_assets]
window_length = 3*252
def compute(self, today, assets, out, gross_profit, total_assets):
norm = gross_profit / total_assets
out[:] = preprocess((norm[-1] - np.mean(norm, axis=0)) / np.std(norm, axis=0))
return {
'OptRev5d': OptRev5d,
'OptRev30d': OptRev30d,
'MessageSum': MessageSum,
'Volatility': Volatility,
'Yield': Yield,
'Momentum': Momentum,
'Quality': Quality,
}
factors = make_factors()
combined_alpha = None
for name, f in factors.iteritems():
if combined_alpha == None:
combined_alpha = f(mask=universe)
else:
combined_alpha = combined_alpha + f(mask=universe)