Hey Gary,
I'll answer your questions first.
a) The reason to have opted for commenting out the mavg bit
No good reason for this, I was just toggling settings and that happened to be commented out in the version I shared.
b) Having shied away from the minute mode required for history(), in the resulting object, what does iloc stand for and what does dropna() do? I presume the latter skips non-useful records.
The iloc method is an indexing operation, it select by integer index location. DataFrame.iloc[0] is the first row and DataFrame.iloc.iloc[-1] is the last row. You are correct about dropna(), it drops any NAN values from the data. I went with daily data to keep from placing too many orders, the costs are brutal at the intraday timeframe if the universe is large.
c) Happen to have a favorite resource for info on talib
talib documentation is crap, I will usually use a help command in a python terminal to find out what the arguments are, then do some googling to learn what the indicator actually does. There are blogs out there with strategies for each one. So to answer your question, no, I don't know of any single resource for the talib functions. There's a list of them here, but the rest is all google for me.
talib functions return numpy arrays for the most part, some of them return a tuple of arrays. That link should give you all the info you need about the data structures talib uses. I have also experimented with some of the talib functions, and it looks like the calculations are done using a rolling window, but I would re-confirm this for each one.
Is there any way this code could be adapted to filter for high volatility fairly easily?
Yes, adding a volatility calculation should be pretty straight forward, depending on how you prefer to calculate it. Using the price standard deviations would looks something like this
prices = history(50, '1d', 'price')
volatility = prices.std()
upper_percentile = np.percentile(volatility, 90)
most_volatile_stocks = volatility[volatility > upper_percentile]
most_volatile_stocks will be a series containing the volatility of the most volatile stocks in the universe, the sids will be the index of the series.
Changes made for this backtest
I modified the original version to hopefully get some better results, this one also uses the accumulation distribution oscillator, and money flow index. The biggest change I made is that I turned it into a contrarian strategy instead of a momentum strategy. The thought process is that the stocks in the higher percentiles of the indicators are overbought and the lower percentile stocks are oversold, so there is more potential upside for the lower percentiles.
I split the percentiles down the middle and considered any stock in the upper half of every indicator to be overbought, and the stocks in lower half of every indicator as oversold. I selected new stocks on a weekly basis, and remained as close to market neutral as possible. I did the 50/50 split so that more stocks would get selected, the idea is that it is easier to hit a target with a shotgun than a rifle, and on average, the lower percentile stocks should have more potential upside.
I am not generally a technical analysis fan, but I think a composition of several indicators should be better than any one on its own. I would not recommend trading this as is, there are some bugs that need attention, and TA is very hit/miss to say the least. I really wanted to demo a method of sorting and ranking a universe of stocks based on several different metrics, whatever those metrics happen to be.
BTW, I forgot to add logging statements, I'll add some on the next iteration of the algo and share the result. The docstring at the top of the code should have been deleted as well
Any thoughts??
David