Are they better off as individual strategies, or more robust when combined
For the contest, combined is likely stronger. At the fund level, combined signals are also advantageous due to the higher Sharpe, stability, and lower volatility. However, I would think under the new structure for the Q Fund, Quantopian would want to combine the individual algos themselves.
penalized by the volatility floor in the score calculation
I must have missed this -- I didn't realize the contest has a volatility floor. This further supports my critique that the contest favors algorithms with intermittent winning streaks over algorithms like yours with stability approaching 1.0.
I wonder whether it's better to increase your position concentrations at the final combination stage (via dropping the positions with the lowest weights and then re-normalizing the weights) as opposed to individually on each style composite. It seems like it should be, no?
Better to build simple models with sustainable SRs in the range 1.25 - 1.50 that generalize, in my opinion.
I agree it's better to build robust signals and avoid overfitting. However, often in these forums and in lectures and books experts claim that you should be exceedingly skeptical of strategies with high Sharpe Ratios. I don't buy it. If you combine enough "simple models with sustainable SRs in the range of 1.25 - 1.50", you'll soon enough arrive at SRs of 4.0 - 5.0+. I think one should be equally--or rather even more--skeptical of low Sharpe ratios. Happening upon a weak spurious correlation is orders easier than arriving at a strong spurious correlation. I don't see how a Sharpe Ratio of 1.25 in of itself will be any more sustainable than 4.5.
So what I'm getting at is, I don't think Joakim's 4.0+ Sharpe ratio in this super-composite should raise eyebrows any more than the 2.5 Sharpe Ratio algorithms it is comprised of. Each of those style composites is probably made up of three or four simple factors, each closer to the 1.25 - 1.5 SR range. Ultimately the question hinges on whether those individual factors are robust or overfit. It sounds like Joakim followed a pretty good process, but as you point out he may also be inclined to juice his in-sample SR.
Somewhere there's a thin line between juicing in-sample Sharpe ratio and making improvements that are somewhat predictive. If out-of-sample performance improves at all, is it not worth it?
Changing rebalance to close is a good start. What other wrenches can he throw in the gears to test robustness?