Recently, we’ve updated you on our progress allocating capital to community authored investment strategies. We have learned quite a bit about what Quantopian needs to do to continue growing, as well as what community members need to do to be successful quants. We are dedicated to spreading opportunity and giving everyone the tools and guidance to create the next $50M algo. In this post, I’d like to share two things you can work on in your own algorithms: portfolio construction, and how to avoid overfitting.
First, some background
Quantopian is a crowd-sourced asset manager. We partner with community members like you to find and exploit market inefficiencies (alpha). We provide all the data, technology, and training for you to research and test your ideas. You do the creative work and keep ownership of whatever you create here. You give us permission to run simulations and evaluate your work. When we find algorithms that we think produce alpha, we offer a licensing/royalty agreement. If you accept our offer, we’ll use your algorithm to direct capital (largest to date is a single algorithm that has been allocated $50M), and pay you a percentage of the returns.
With over 200,000 members, the Quantopian community is the largest quantitative research group in the world. At a traditional fund, the primary research challenge is sourcing enough strategies. At Quantopian, we have the complement of that problem: filtering through a huge number of ideas to find the best ones. Quantopian members have been contributing investment algorithms for years, and we are now the custodian of the world’s largest database of investment algorithms.
Our Learnings
As a crowd-sourced asset manager, our job at Quantopian is to carefully evaluate the performance of each algorithm in our database, while maintaining the privacy of your intellectual property. What did we learn from evaluating such a large number of investment algorithms? We found we need to teach and then guide the community in two areas:
- Portfolio Construction
- Avoiding Overfitting
Portfolio Construction
Evaluating a large set of strategies in our database, we learned that the Quantopian Community needed more guidance on how to construct and maintain a market-neutral portfolio. The most common faults we found were structural (e.g. trading illiquid stocks, high position concentration). Our new contest rules grew out of this work. Now you can use the contest rules as a clear guideline for creating a structurally sound algorithm.
Turns out, these structural properties are actually easy to check on in-sample data because they generally don’t change out-of-sample. In practice, this means we can automatically check these criteria as soon as you’re done coding. In fact, we do - the full backtest screen now reports whether your algorithm’s backtest satisfies all the criteria.
Overfitting
Of course, portfolio construction is just the beginning. Creating a structurally sound algorithm positions you to tackle the next challenge, which is quite a bit deeper: building an algorithm that performs well on new data (out-of-sample performance). If you’ve heard any talks by our Managing Director of Portfolio Management and Research, Jess Stauth, you know that a major pitfall for out-of-sample performance is overfitting. Overfitting means your model is brittle, and that it will fail when it encounters new data.
Competing in the contest is a great way to test your model for overfitting. Our scoring function emphasizes consistent performance over time, and is calculated on a daily basis against new data.
Overfitting can happen in myriad ways, and as you explore potential strategies, you need to keep asking yourself if overfitting could be creeping in.
What does overfitting look like?
- Bad data hygiene -- repeated backtests over your entire available data history without reserving any data for out of sample testing.
- Excessive precision without accuracy -- parameters tuned to 5 decimal places but tested on just a few thousand data points.
- Similarly, enormous parameter space -- your model is tuned with 100s of parameters, but you have a only 10,000 data points to test.
- Rare event exploitation -- your regime detection model triggers once in a ten year simulation, perfectly timing the reversal of your two ranking factors.
Overfitting isn’t specific to investment models. Check out this story of a Kaggler Dropping 50 Spots in 1 Minute. The Public Leader Board in Kaggle competitions is very similar to backtest results on Quantopian. Not only are the results not necessarily indicative of future performance, the more effort you put into optimizing the results in-sample, the more likely you are overfit and due for a catastrophe when your model is released in the wild.
Our goals are totally aligned with yours here, when we talk about overfitting, that’s because it’s the single biggest problem faced by our community members. For a more in-depth look, the Quantopian Lecture series has an entire lecture devoted overfitting in quantitative finance. We hope you’ll come away with a greater understanding of what it is and how you can avoid it, but if you have questions we’d love to hear them on the Quantopian forums.