The basic idea in the OLMAR paper (http://arxiv.org/pdf/1206.4626.pdf) is that the change to the portfolio will maximize the expected return for that change, while changing the portfolio as little as possible. This is all captured in mathematical elegance in Section 4.2 of the paper (with the comment "Note that we adopt expected return rather than expected log return"). As I see it, there is an assumption that the convex objective function is appropriate (the first term under "Optimization Problem: OLMAR"), and a second assumption that the inequality constraint correctly captures reversion to the mean across all securities in the portfolio (and that the value of epsilon can be fixed, rather than updated on a walk-forward basis, for example). There is also the "BAH(OLMAR)" assumption, discussed later in the paper (effectively factoring out the choice of a specific trailing window length for computing the mean price, and smoothing out the algo performance).
You might dig into the OLMAR paper references, along with subsequent literature. Here's an extensive survey:
http://arxiv.org/pdf/1212.2129v2.pdf
I'm not sure if any of this is relevant to the Q hedge fund effort. Intuitively, mean reversion, if it is happening, should have both long and short components, right? One thought is to reformulate the optimization so that the universe of stocks would be divided into longs and shorts (with enough variance, as Andrew points out), and then an OLMAR-like optimization could be applied separately to each sub-universe, long and short. For it to make sense in the Q hedge fund, I figure this would need to scale up to 50-100 stocks that would support $5M to $25M of capital. This is probably just re-inventing the wheel, since I have to think that in the hedge fund world, every type of long-short slicing and dicing and optimization has been examined ad nauseam, but maybe the various approaches haven't been covered exhaustively in the open literature.