Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Mystery at Contest 34/35...Case of Flawed Scoring System

This morning I was suprised when I checked the Leaderboard of Contest 34/35, the "previous" leader for a couple of weeks running, Bright Red Elephant was no where to be found! Either he dropout and stopped his algo ("guilty conscience") or was disqualified. I need confirmation on this from Quantopian. I brought up his performance anomaly in this post here and discussed more in detailed here

In a nutshell, he was able to receive no.1 ranking with numbers like these:
1 PAPER TRADING SCORE
86.50
300 ANNUAL RETURNS
0.0003249%
1 ANNUAL VOLATILITY
0.00009459%
31 SHARPE
3.486
1 MAX DRAWDOWN
-0.00003036%
18 STABILITY
0.8895
48.0 SORTINO RATIO
5.843
1 BETA
0.00001323
CORRELATION
15.90%

Really, would you in your right mind invest in a fund with these numbers?! I have an idea as to how this guy 'gamed' the contest by deploying a scalping algorithm. There are few more in the top 50 lurking with these type of numbers.

Which brings to my point that the scoring mechanism is totally flawed. And the culprit is Q's wrong calculation or misinterpretation of Sharpe and Sortino ratios which did not account for risk free rate being adjusted from portfolio returns, as intended by its original authors. Had these metrics been calculated correctly and as intended, the "leader" would have negative Sharpe and Sortino ratios using prevailing risk free rates (around 1.05%) and thus receive lower ranking. Why you ask is risk free rate adjustment important? This is because it establishes the minimum threshold for an equity portfolio or any other alternative asset class to beat. Simply put, why will you invest in an equity fund, with all its associated transaction costs and risks, that underperforms a risk free 3 month Treasury Bill? This is a very basic but important investment principle.

I am really impressed by Quantopian's design of a highly hedged long/short equity fund that mitigates risks with diversification and risk dispersal techniques, that is why I am baffled as to how this very basic investment principle of establishing a minimum threshold of returns via the risk free rate, slipped through their thought / design process. I have tried to reach out to Q regarding this through some earlier posts in the forum and got no response or were indifferent. I think the management of Q owes it to the community to explain this anomaly. Perhaps, they do have a legitimate reason/ logic for this and I what to hear and understand it. If it's a honest mistake, own up to it and rectify it. We all make mistakes, it's human nature.

1 response

I got an indirect confirmation that the above referred contestant was disqualified in the post by Dan Dunn here

There are still some entries at the top 50 that have these type of performance: low returns, low volatility, low drawdowns, high stability, high Sharpe and Sortino ratios (as per their computation which do not factor in risk free rate) that received high rankings. As I mentioned above, this exposes the flaws of the current scoring system. The flaws stem from two things: (1) wrong calculation of Sharpe and Sortino ratios which doesn't factor in risk free rate s the minimum threshold to beat, and/or (2) it does not define what Q is looking for in terms of their minimum required returns. This can be easily solved by correcting (1) or establishing (2) or both.

Dan Dunn:

It's worth noting that in the upcoming new contest, this type of algorithm won't score well at all.

The code for the proposed new scoring function is below:

def volatility_adjusted_daily_return(trailing_returns):
"""
Normalize the last daily return in trailing_returns by the annualized
volatility of trailing_returns.
"""

todays_return = trailing_returns[-1]  
# Volatility is floored at 2%.  
volatility = max(ep.annual_volatility(returns), 0.02)  
score = (todays_return / volatility)  

return score

def compute_score(returns):
"""
Compute the score of a backtest from its returns.
"""

cumulative_score = 0  
count = 0  

daily_scores = returns.rolling(63).apply(volatility_adjusted_daily_return)  

avg_yearly_score = np.sum(daily_scores) / (len(daily_scores)/252.0)  

print 'Avg Yearly Score: %f' %  avg_yearly_score

So basically it is a volatility adjusted return measure with volatility floored at 2%. This implies that you are establishing a minimum of 2% assuming the formula's benchmark is 1. This satisfies my proposed correction no. (2) above, defining a minimum return.

However, without factoring in my proposed correction no.(1), inclusion of risk free rate adjustment to returns, you can run into a problem in the future. If we define risk free rate as the 3 month US Treasury Bill currently at around 1.05%, today it might seem negligible but historically for the last say 30 years, it was high around 14% to low of 0.2% and averaging about 5-6%. This means that even if you established your minimum required returns via flooring the volatility measure at 2%, in times when risk free rate is above 2%, your benchmark falls apart. I have asked Jamie twice and reiterated to Thomas the question behind the rationale of flooring volatility at 2% but never got a response.