Here's an algo for your consideration. Will post tear sheet next. Appears to be good-to-go for the contest.
Here's an algo for your consideration. Will post tear sheet next. Appears to be good-to-go for the contest.
Try to address the -9% dip with 2013 start (three year loss).
And give 2008 a go, where Q might want to know about this error msg around Aug 26 where the NaN col in RiskModelExposure() is hidden:
ValueError: NaN or Inf values provided to FactorExposure for argument 'loadings'.
Rows/Columns with NaNs:
row=Equity(32430 [SHA]) col='industrials'
row=Equity(32430 [SHA]) col='momentum'
row=Equity(32430 [SHA]) col='short_term_reversal'
row=Equity(32430 [SHA]) col='size'
row=Equity(32430 [SHA]) col='value'
... (1 more)
Rows/Columns with Infs:
None
There was a runtime error on line 213.
For workaround, consider something like
context.risk_loading_pipeline = pipeline_output('risk_loading_pipeline') .dropna()
Holy smokes, with that workaround I added above and 2008 start, not only did it get thru the crash but also on Aug 26 instead of down -2.7% is up +1% and reaches +38% in May 2010. This implies work needed by Q on the risk model constraint and possibly an easy improvement.
Academically, Grant just wondering what it means in terms of IP issues when an algorithm adapted from a published research paper that has been in the Quantopian Community for critiques and inputs for years, and open to competitors in full public access?
@ Karl -
I don't have a clue on the IP question. If you are looking for permission to use anything that I've posted on Quantopian, I hereby grant it (permission is automatically granted by the Quantopian Terms of Use). Presumably, Quantopian has to address this issue formally when they license algos. The algo I posted above would be an interesting test case. I have to figure that Quantopian will provide some legal advice regarding copyright/licensing/IP to authors, upon request.
Here's a longer backtest, with the suggestion above:
context.risk_loading_pipeline = pipeline_output('risk_loading_pipeline').dropna()
I also dropped the Direction
factor that was not being used.
Using nanfill on Volatility. The lower results seem to be saying the more nans, the higher the results will be. 11% forward filling nan. 29% leaving nans alone. Looks like a count of around half a million nan on first day, no idea why it reports three times each. Original date range ...
One mystery is why there are nans in the first place. I'm a bit confused. I would have expected with the
QTradableStocksUS
and Pipeline, we'd be nan-free. What am I missing? Is it that the Pipeline data are not forward-filled? Or are the nans errors in the data?
Meanwhile run this and look at the logging window. All zero for fcf for example. One way to address that is a longer window in the factor (maybe 5 or 22 would do, or 63+ to cover a quarter), forward fill and return the last,
class fcf(CustomFactor):
inputs = [Fundamentals.fcf_yield]
window_length = 88 # 1
def compute(self, today, assets, out, fcf_yield):
fcf_yield = nanfill(fcf_yield) # seems fine even without this, given longer window vs 1, why?
out[:] = preprocess(np.nan_to_num(fcf_yield[-1])) # [-1] added