The first one. In the case of the maximum drawdown, you just take your returns, assume they're iid (I think), sample with replacement, compute the the maximum shitty run, and do it a bunch of times to approximate the sampling distribution. I know this resampling stuff can get hairy, so you might want to talk to an expert first to see whats feasible.
Understanding what you mean by the second one is difficult for me since I don't know what sort of models we're talking about. But I would say generally after you fit a model you get the parameter estimates, and you don't really change them. If you're changing them, you never really fit anything. I might be missing something, but this seems like it might be going the way of data snooping, which leads us astray on how well we think our algos perform. With the third thing, yeah, model selection is important but this wasn't what I had in mind.
Maybe I'll put together a batch transform example that periodically selects the best arima model, then uses it to trade the next "x" days out. There are more elegant ways to capture this quasi-periodicity (not totally sure on if I'm using this word correctly :) ), but the periodic refitting of an arima is probably easiest to follow. Still learning python, but I'm guessing that the functions that choose the models for us return model objects with model selection things built in as data members...this takes care of model/algo comparison for you I think.
edit: looks like someone is already headed that way: https://www.quantopian.com/posts/arma-timing-out-and-r2py I know much less about computer stuff than you guys, but doesn't that suck for you if everyone is fitting stuff all day with your computers? Maybe a way to offload some of the work onto client machine? I'm way out of my depth on this one, though.