Today we named our 7th winner of the Quantopian Open. The judging process was harder than usual. For the first time we had to invoke the "financially prudent" rule and disqualify two of the entries.
Over the last 7 months we've learned a lot about judging algorithms. We have taken what we've learned and put it into software tools. We've made those tools available to the community and made them open source for further scrutiny and improvement.
We used these new tools on the contest leaderboard, and they showed us that two of the leading algorithms wouldn't be financially prudent for investing. The prize, of course, is for the winner, but the trading risk is entirely ours. We need to manage that risk. When an algorithm isn't financially prudent, it isn't eligible to win the prize.
We thought it would be helpful to share the risks we see in these algorithms so that we can all learn from them, as a community.
Algo 1: Too Unpredictable
The first algorithm lacks consistency, and it is impossible for us to determine what the algorithm will do going forward. It's easiest to explain with these pictures coming out of pyfolio.
Here is the performance of the algorithm starting July 2013 through this past Friday, August 28th. The algorithm was submitted to the contest early on August 3rd. As you can see, it drops 19% in its out-of-sample test period.
However, when you look at the paper trading results on the leaderboard, the algorithm had positive returns of more than 20% during the out-of-sample test period. That is head-scratchingly different - why would an algorithm behave differently in the month of August depending on what day the test started? You gain additional information - and uncertainty - by looking at the algorithm's long/short exposure over the longer backtest:
What you see there is the algorithm making a dramatically different bet on the first day of out-of-sample testing. After a couple years of generally flat investment and unmanaged (declining) leverage, the algorithm suddenly doubles its long-only exposure, and it also doubles the number of stocks that it holds.
When you add all of that information up, we have an algorithm whose activity, risk, and performance are unpredictable. We don't have an in-sample backtest with matching out-of-sample trading behavior. That's not something that we can prudently invest in.
Algo 2: Too Overfit
We disqualified the second algorithm because it was overfit. The first thing that concerned us was the month-to-month comparison of the returns of the algorithm.
As you can see, the month of July was very high, far higher than the other months. That comes through most clearly when you we looked at the Bayesian cone. Below, I have a Bayesian cone that uses January 1st as the cone's start date.
What you see there is an algorithm that behaves in a certain way for many months, and then the month before the contest starts, it takes off dramatically. The algorithm exits the Bayesian cone; it's no longer behaving the way it used to. That's what it looks like when a machine learning algorithm trains on a specific set of data. That training period looks fantastic, but it's only on in-sample data. Once the algorithm gets out-of-sample, the performance drops dramatically. It did pretty well in August, but it looks like that was more luck than a repeatable trading strategy.
I know that some people will look at that graph and think we're crazy to not use this algorithm. The argument goes something like this: "Who cares what's going on, so long as you're making money?" We don't subscribe to that school of thought. When you don't know what's going on, it hurts you just as often as it helps you. Furthermore, we have experience watching overfit machine-learning algorithms, and they always crumble in time. We don't believe that this algorithm can maintain the upside, and we believe it will fall apart in the coming 6 months of the prize period. It's not a financially prudent investment.
Algo 3: The Winner
Having shown you a couple algorithms that we weren't comfortable investing in, I'd like to show you one that we do like.
This algorithm was submitted on May 29th, so it has three months of out-of-sample to study. As you can see, it has managed to maintain itself within the predictive cone for that period, and is making pretty good money.
There are some nits to pick in the full tearsheet review. The algorithm doesn't do everything right. But on balance, it's one that passes the prudent test. The backtest and out-of-sample test are consistent and positive. It's a good one to manage the $100,000, and we hope to write the author a big check at the end of the prize period.
Wrapping Up
It's important that algorithm writers keep the long-term in mind, and not optimize for a single month's performance. The fund page has a list of "What we look for" and "What we don't want" that should be helpful. The way to win the contest, and the way to get in the fund, is to build a fundamentally sound algorithm that is based on a good investment thesis.
Finally, over the coming weeks we plan on doing a lot of education about how to use pyfolio to evaluate your own backtests.