This is the third part of our series on Machine Learning on Quantopian. Most of the code is borrowed from Part 1, which showed how to train a model on static data, and Part 2, which showed how to train a model in an online fashion. Both of these were in research so they weren't functional algorithms. I highly recommend reading those before as it will make the code here much clearer.
It was pleasantly easy to copy over the code from research to make a functional algorithm. The new Optimization API made the portfolio construction and trade execution part very simple. Thus, with a few lines of code we have an algorithm with the following desirable properties:
- Uses Machine Learning on a Factor-based workflow.
- Retrains the model periodically.
- Trades a large universe of stocks, using the Q1500US universe (chose a subset of 1000 stocks here).
- Beta-neutral by going long-short.
- Sector-neutral due to new optimization API.
- Sets strict limits on maximum weight of any individual stock.
I also tried to make this algorithm to be template-like. If you clone it, you should be able to very easily put in your own alpha factors and they will be automatically picked up and incorporated. You can also configure this algorithm with a few prominent high-level settings:
N_STOCKS_TO_TRADE = 1000 # Will be split 50% long and 50% short
ML_TRAINING_WINDOW = 250 # Number of days to train the classifier on, easy to run out of memory here
PRED_N_FWD_DAYS = 1 # train on returns over N days into the future
TRADE_FREQ = date_rules.week_start() # How often to trade, for daily, set to date_rules.every_day()
However, this is definitely not the be-all-end-all algorithm. There are still many missing pieces and some rough edges:
- Ideally, we could train on a longer window, like 6 months. But it's very easy to run out-of-memory here. We are acutely aware of this problem and are working hard to resolve it. Until then, we have to make due with being limited in this regard.
- As you can see, performance is not great. I actually think this is quite honest. The alpha factors I included are all widely known and can probably not be expected to carry a significant amount of alpha. No ML fanciness will convert a non-predictable signal into a predictable one. I also noticed that it is very hard to get good performance out of this. That, however, is a good thing in my perspective. It means that because we are making so many bets, it is very difficult to overfit this strategy by making a few lucrative bets that pay off big-time but can't be expected to re-occur out-of-sample.
- I deactivated all trading costs to focus on the pure performance. Obviously this would need to be taken into account to create a viable strategy.
As I said, there is still some work required and we will keep pushing the boundaries. Until then, please check this out, improve it, and poke holes in it.