Gamed / backtest overfitting algos in the contest?

Back to Community

posted

Hi! Let's say, hypothetically, that someone wants to assure that he/she will be the one with the best backtest in the contest and makes an algo that does a lot of overfitting... How can the Quantopian team guarantee to other Q Open contestants that this algo will be disqualified?

If someone makes a good, not gamed algo, that really makes a beautiful profit, that has a low beta, low DD, but the annual reuturns, sharpe and calmar ratio are not good enought to assure the 1st place and decides to implement some cheating lines of code in the algo that will run only in the backtest. Is Q able to see that? How does Quantopian see that, without looking at the code?

Let's take my algo. I worked night and day and I came up with a good strategy. The result of the 2 years of backtest:
Annual Returns 22.28% // Sharpe 2.472 // Max Drawdown -3.652% // Calmar Ratio 6.101 // Beta 0.03256
I could do better, a lot better, but I liked the strategy and I think it has great potential in live trading. Whatever.

The point is that a CALMAR of 1700+ seems to be out of this world. And I work with statistics. When something looks highly improbable, almost impossible, you maybe want to check it twice. It's medium to low frequency trading, not HFT. I don't think Renaissance Technologies has algos with that calmar for MFT.

320% 2-year return with -0.095% max DD?! Come on!*
* I don't accuse anyone yet, but my rational part of the brain is very suspicious.

So, if I would want to game the backtest of an algo, it would be extremely easy**:
1. Search for securities or ETFs with very high return in the past 2 years. Ex: ADXS, BLUE, CBMG, ESPR. Or even NASDAQ-100 and DJIA components with small periods of very high positive percentual change.
2. Cut out the periods of DD by buying and selling at exactly the perfect moment by using lines of code like:

def handle_data(context, data):  
    pos = context.portfolio.positions  
    if not pos:  
        if get_datetime().year == 2015:  
            if get_datetime().month == 3:  
                if get_datetime().day == 5:  
                    order_target_percent(sid(44989),0.2)    # ESPR  
    if pos:  
        if get_datetime().year == 2015:  
            if get_datetime().month == 3:  
                if get_datetime().day == 23:  
                    order_target_percent(sid(44989),0)    # ESPR

Lower the beta by finding more opportunities for short selling some stocks.
Be clever and use simillar stocks with high liquidity in the NAS-100 / DJIA with simillar periods of high return, low volatility. It needs to look like the algo is trading the same strategy on every stock it uses.
Work for a couple of weeks with a stock screener to find the best stocks.
Use low market exposure in order to get a very low DD.
Use a simillar strategy for the live trading algo, even if the results are not so consistent with the backtest.
Don't get greedy.
Voilà! Evrika! You are in top 10.

Excuse my english, I'm not a native speaker.
**P.S.: Please don't use this exemple to create a contest algo, because I would be extremely disappointed.

10 responses

Deleted User

I understand what you are trying to say. Just looking at all those exaggerated numbers discourages me from entering into contest. But then who cares about contest. Hopefully algo gets selected into fund.

Adrian Boca

Hi Beginner! Thanks for your reply. Well, I care about the contest, because I was so excited about the whole Quantopian concept, the team, the transparecy, the attention to details, the passion about creating something new and different, the use of IEX order routing, the fairness, the belief in the real quantitative science...in finding the Holy Grail of algo trading systems. And now I feel disappointment in my trading-algo developer soul.

Arshdeep Singh

This concern seems to have been raised in the past contests too. I'm hoping the consistency metric would ring an alarm bell if any sort of over-fitting has been done. Though the 35% weight (assuming the submission is done on the last day) given to the back-test is still debatable. One suggestion I have, to further penalize any temptations to over-fit, is to weigh the backtest inversely with the consistency, in addition to decreasing it linearly from 70% to 0% over 60 days of live trading

Adrian Boca

To weigh the backtest inversely with the consistency seems a realy good idea, indeed.

Lucas Silva

There is a huge thread with all possible ways to game the contest and how to counter it, don't think quantopia has done much to fix it. Paper trading will filter some down but some algo get a good boost from backtesting.

Adrian Boca

My bad. I just found this thread: https://www.quantopian.com/posts/scoring-changes-for-the-april-quantopian-open .

Dan Dunn: Consistency Between Paper Trading Results and Backtesting
The second change is that algorithms are now scored on how consistent they are between their paper trading returns and backtesting returns. The more consistent you are, the better. This factor is added to the calculation at the very end. After we compute what used to be referred to as your final score, we now multiply it by the consistency number, and the result is the new final score. This is applied gradually over the first few days of trading while the paper trading record is very volatile, and is fully applied at 20 days of trading.

Dan Dunn

Adrian, I'm glad you found that post, because it's exactly the right answer. A few more words on the topic:

You asked about the contestant who "decides to implement some cheating lines of code in the algo that will run only in the backtest. Is Q able to see that? How does Quantopian see that, without looking at the code?"

We certainly had some of that happening earlier this winter. We had a few contestants who wrote algorithms that effectively said "Run this code during the backtest period, and run this other code in the paper trading period." They'd submit a few different versions, each one placing an aggressive paper trading bet, and hope that one of them hit.

As you say, we are constrained in how to handle this. Without looking at the code, how can we know that is happening? The answer is to look at the algorithm output. It's possible to over-fit your algorithm to the past, but you can't over-fit it to the future. The consistency of the algorithm in- and out-of-sample is definitionally impossible to maintain, and the algo will slip down the charts.

I feel like this comes up at the beginning of every month. Someone yesterday suggested that we just not put new algorithms in the leaderboard for the first couple of weeks to help prevent that.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Jamie Lunn

My thoughts are that the backtest date range should not be disclosed.. thus making it difficult to over-fit over the scoring period.

Tristan Rhodes

@Jamie - Cool idea! Quantopian could select a unique 2-year date range for each monthly contest, and make it nearly impossible to over-fit that period.

This could be similar to using out-of-sample data for backtests (since you can't design algorithms based on the test period):
http://blog.quantopian.com/9-mistakes-quants-make-that-cause-backtests-to-lie-by-tucker-balch-ph-d/

Adrian Boca

@Dan Dunn
Thank you very much for your response!

You've successfully submitted a support ticket.

Our support team will be in touch soon.