Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Request: real world strategy scoring metric

Disclaimer: I'm primarily interested in this topic as a curious party only.

Currently the Q's Open Contest is scored on a ranking system. The numbers that ultimately make up the rank are meaningless outside of the contest. Therefore one cannot readily perform calculations that reflect how one's strategy might be slotted in the leaderboard without actually submitting a strategy. In addition I would point out, and please excuse my crudity, that the highest ranked crap -- is still crap. (Not that the strategies in the contest are "crap", mind you.) But relative ranking tells one nothing about the true merits of a strategy.

I, for one, and again this is only for research purposes, would like to see the Quantopian Wizards come up with something that could be used as a general purpose measurement that is not dependent on a sinking ship full of rats (it may be the fastest, smartest rat at the top of the mast, but it's still a rat). (Not that the strategies in the contest could be considered vermin, mind you.)

In the spirit of attempting to divine such a measurement I offer this crude spreadsheet. It's worthless as is, but an intelligent agent (not me of course), could take some of these numbers (and some that are missing) and perhaps conjure up some formula that could be used by anyone's strategy to give every backtest and paper trading run an instructive representation of how their strategy might rank.

As an aside, I wonder if the Contest organizers could include the number of days (in the downloadable CSV file) that any one strategy has been paper trading. That number alone is critical to the efficacy measurement.

Kindly take the above as lightheartedly as possible (in addition to its earnest request), after all it is nearly Saturday and it was time to ruffle Jonathan's feathers.

4 responses

Thanks for the question MT and unfortunately I can't give a super detailed response right now as its a hectic Saturday for me right now.

But yes, we have thought about this, but with all the other moving parts/projects we have going on right now it's been somewhat backburnered, though I did spend some time starting to make a more generalized scoring model a couple months ago.

When I began researching this a few months ago one of the ideas I had was taking each "important" metric (e.g. for the sake of discussion, lets just say each of the 7 contest metrics) and transform it by using a logistic function, or more generally a sigmoid curve. http://en.wikipedia.org/wiki/Logistic_function

The idea being that there is some linear range where increases or decreases in the metric value can be considered "realistically" linear in the real world for evaluating an algo's performance (e.g. It might be reasonable to say that an algo's backtest Sharpe Ratio's importance might be viewed as linear between say the range of 0.5 and 3.0, for example), but at extreme values, the Sharpe Ratio value starts to become asymptotically more meaningless -- One might even say that a Sharpe Ratio of a backtest of 9.0 is really no different than a Sharpe of 10.0 and as such each would get transformed by the sigmoid function to some value near 1, and same could be said about bad Sharpe Ratios... perhaps even in this case any Sharpe less than 0 would get transformed by the sigmoid function to a value asymptotically close to 0.0. So the idea would be to establish sigmoid functions relevant to each "important" metric, e.g. a realistic linear range is chosen per each metric. And then since all of the sigmoid transformations effectively normalize each metric to some value between 0.0 - 1.0 then you can just take the average across each for the final "score"

Again.. this is totally just something I started looking at for just a few hours across a couple day, several couple months ago, and the idea is not even close to being fully vetted, but since I found myself with a couple minutes this Saturday evening thought I'd share what I had started looking at a few months ago. As well, we would certainly love any feedback, comments, etc, that others have about this.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@ Justin,

The error function, http://en.wikipedia.org/wiki/Error_function, is also a consideration, if you are looking for a smoothly varying "S-curve."

In the end, the contest needs to steer algo writers toward strategies that would be useful for the Q crowd-sourced hedge fund. In the case of my winning algo, if I understand correctly, it was a catastrophic failure, since the beta is roughly 1.0. So you added a beta metric. How should it be weighted? In the limit that you have an infinite number of metrics, and you add one more, equally weighted, it won't have much impact. In other words, if you really need zero beta strategies, then you'll need to revise the weights to put more emphasis on beta.

The problem, though, especially with the backtest portion of the score, is that bias can be applied, and then tempered with shorting the index or an inverse ETF, to bring down the beta. So, you are left with only one month of live trading that can be trusted, unless somehow you can filter by the consistency metric.

Grant

A few helper thoughts on this topic:

From my experiments, for some data larger-is-better. Other data, smaller-is-better. This requires either that one divides a product of the former by a product of the latter. Or one inverts the smaller-is-better factors. Sigmoid seems to handle this pretty well though.

The values that go into the sigmoid function will have to be reconciled. As Justin mentioned, some values greater than X add no information (A Sharpe of 50 or 500 are equally meaningless and undoubtedly bogus). In order to bracket such values there needs to be some rational maximum applied. For instance:

MaxRationalSharpe = 3.0
RationalizedSharpeSigmoid = 1 / ( 1 + EXP( - MIN( sharpe_pt, MaxRationalSharpe )))

RationalizedDrawdownSigmoid = 2 * (1 / ( 1+ EXP(-maxDD_pt * 2)))

I'm not sure which direction some of these should tend toward, that is, corr_pt means what exactly? (Higher better? Or lower better?) And beta needs to be inverted:

BetaInverter = 2
RationalizedBetaSigmoid = 1 / ( 1 + EXP( - (BetaInverter - beta_spy_pt)))

I would assume that PT values need to be weighted by the number of days in paper trading. Say the = LOG ( Days Paper Trading). And one could give a similar value to the BT result, say = LOG (BackTest Equivalent Days) [ a number like 10 or 20 would suffice here).

I might also add that any strategies with negative P&L in their backtest stage should be simply set to a score of ZERO. We're building pyramids here. If you built for 2 years and all you have is a hole -- STOP!

And finally, each factor needs it own weight:

consistency     1  
annRet_pt       8  
sharpe_pt       7  
stability_pt    5  
calmar_pt       4  
beta_spy_pt     3  
annVol_pt       2  
maxDD_pt        6  
corr_pt         1  

So we have an equation like this for annual pt return:

RationalizedAnnualPTSigmoid = 1 / (1 + EXP(annRet_pt)) * LOG(DaysTrading) * PrimaryWeightForAnnualReturns

Add up all the rationalized sigmoids and divide by the weight total (37) and you'll have the PT score. Do the same for the BT factors and add them together.

Seems to work (I created a google sheet to help me play with these...)

Applying Sigmoid curve or error function may reanimate Q's Open Contest ranking system which is in critical condition.
But let's look deeper why it is ill?
Look at elements.
There are three kind of them: GOOD, BAD, GOOD to BAD ratios.

GOOD
Annualized Return - the main and the only one which you can deposit to your bank account.

I will add Omega Ratio which is defined as the probability weighted ratio of gains versus losses for some threshold return target

=SUM(IF(Returns > RiskFreeReturn, Returns - RiskFreeReturn,"")) / -SUM(IF(Returns < RiskFreeReturn, Returns - RiskFreeReturn,"")) 

BAD
Annualized Volatility- the lower the better.
It actually may be interpreted as ratio:
=1/StDev(Returns)*SQRT(252)

Deviation from the mean if it is up is GOOD
Deviation from the mean if it is down is BAD
That is why I recommend instead of Standard Deviation use Downside deviation or Ulcer Index.

Maximum Drowdown - the lower the better.
It actually may be interpreted as ratio:
1/Maximum Drowdown

Beta to SPY - the lower absolute value the better.
It actually may be interpreted as ratio:
1/absolute Beta to SPY

It used in Treynor ratio but my personal opinion is: The best algo should have as big positive
beta as volatility allowed in bull market and as lower negative beta as volatility allowed in bear market.

GOOD to BAD ratios
Sharpe Ratio
Annualized Excessive Risk Free Return /Annualized StDev Excessive Risk Free Return

Deviation from the mean if it is up is GOOD Deviation from the mean if it is down is BAD .
That is why I recommend instead of Standard Deviation use Downside deviation or Ulcer Index.In other words I recommend replace Sharpe Ratio by Martin Ratio оr Sortino Ratio.

Calmar Ratio
the formula Q use
Annualized Excessive Risk Free Return /Maximum Drowdown

is actually not Calmar Ratio (Calmar Ratio require 36 month window for calculation).

Now look at this
Five of eight components have BAD in denominator.
What happens if they follow the command "the lower the better"?
It doesn't metter what the main goal- Annualized Return is it needs just be positive and all the ratios will go to positive infinity.
That is the problem!
I compared 3 algo in Excel

Algo 1 was just 1 day in the market and made 1%  

1 Month statistics
Return 0.010
StDev Return 0.001
Beta 0.010
Ulcer Index 0.001
Max Drowdown 0.001

Algo 2 was all the time in SHY (money market)  

1 Month statistics
Return 0.002
StDev Return 0.001
Beta -0.004
Ulcer Index 0.002
Max Drowdown 0.008

 Algo 3 was in the market all month and made 4%  

1 Month statistics
Return 0.040
StDev Return 0.015
Beta 0.300
Ulcer Index 0.010
Max Drowdown 0.015

All Components statistic (I included some other than Q components but it dose not really meta)

                   Algo 1           Algo 2          Algo 3          Algo 1  Algo 2  Algo 3  
                  Annualized    Annualized  Annualized              Rank    Rank    Rank  
Return          0.12682503   0.024265768    0.601032219              2       3       1  
StDev Return    0.015874508  0.015874508    0.238117618              1       1       3  
Beta            0.01        -0.00387779     0.3                      2       1       3  
Ulcer Index     0.001        0.001968738    0.01                     1       2       3  
Max Drowdown    0.001        0.008406944    0.015                    1       2       3  
Sharpe Ratio    7.989225946  1.528599699    2.524098064              1       3       2  
Treynor Ratio  12.68250301  -6.257627387    2.003440729              1       3       2  
Martin Ratio  126.8250301   12.32554373    60.10322186               1       3       2  
Calmar ratio  126.8250301    2.886395724   40.06881457               1       3       2

                                                 Total Score         11      21     21  

Let's not talk about the winner but the losers.
The ranking system give both 21 but Algo 2 has 2.4% Annualized Return and Algo 3 has 60.1% 30 times more
It is definitely the best by all means but not by the ranking.

How to fix the problem?
Most of trading software packages has his own single number score indicator (WealthLab Score, Tradery Score...)
Why Quantopian not to have such one?

Let me create QuantopVYan Score

   Numerator(all GOOD)=((1+Annualized Return)**Kret-1) * Omega Ratio 

   Denominator(all BAD)=((1+StDev)**Kstd)*((1+abs(Beta))**Kbeta)*((1+Ulcer Index)**Kulcer)*((1+MDD)**Kmdd)

   QuantopVYan Score=Numerator/Denominator  

Results with the same data and Kret=1,Kstd=1,Kbeta=1,Kulcer=1,Kmdd=1
(I did not include omega ratio in my Excel calculation just Annualized Return)

   Algo 1                       Algo 2                          Algo 3  
  QuantopVYan Score    QuantopVYan Score     QuantopVYan Score  
   0.126343947            0.024041225                0.646943894  
   Rank                         Rank                            Rank  
   2                               3                                   1  

Everyone on its place.
Now it is ready to play with K's and apply sigmoid curve or error function for better fitness ,