Sometimes we get good results on an algorithm because of unknowingly choosing a stock that outperforms S&P during the backtesting period. Thus the algo misleadingly will show good results compared to the benchmark (S&P 500).
To actually measure the algo's performance,I usually plot my algo's performance against returns of a buy-and-hold strategy. If the algo doesn't outperform a buy and hold strategy, then I consider its performance is sub-par
As you can see in the below example, the algo outperforms S&P. But in the custom graph, you can clearly see that it doesn't match up to a buy-and-hold strategy
Here is the code for the same. It can be extended for all the stocks that we trade in. This also shows areas of improvement for the algo. For example, the example algo doesn't buy when the stocks are skyrocketing (Jan-Apr 2012) and thus shows that is is not able to take advantage of bull runs.
Note that the buy-and-hold returns could be made more accurate by buying additional stocks on dividend payments. I leave that as an exercise for the reader ;)
(feel free to correct anything)