Hi Grant,
I want to explain the current behavior and answer your questions, but I want to say from the beginning that we are open to advice for revisions to the behavior.
There are a few things that will hopefully make the system design more clear, and also make it easier to answer your questions.
Source events, like trades, data rows from fetcher csv files, splits, and dividends are all merged into a single stream of events, sorted by datetime (we have a tie breaking scheme to ensure deterministic sorting).
Each individual event is fed first through our slippage model and then through our performance tracker. Those componenents are designed to accept single events in a long series. Because the data is just a series of events, if a stock doesn't trade, the components simply don't receive an event and nothing happens. Also, all the internals receive the data before the algorithm, so the data can't possibly be altered in a way that affects the simulation results.
In an attempt to simplify algo coding, we decided to aggregate the trade events from across your security selection (whether a manual selection in code, or a set_universe call) before calling your algorithm. The benefit is that your algorithm is invoked at most once per bar (day or minute), and your algocode doesn't need to perform any book keeping to aggregate the most recent events. That's why the data parameter to handle_data is a dict-like structure keyed by sid.
The design question was what to do when one or more of the stocks in your universe doesn't trade in a given bar. We considered three alternatives:
- remove the non-trading sid from the data parameter's keys
- leave the key and set the value to None
- leave the key and retain the most recently received trade event in the value
We opted for the third, mainly because each value in data has a timestamp. This was a pretty early design decision, and I assumed that having the last bar would not cause trouble because algos could filter by date.
We've always described the behavior as "forward filling", which has lead to some misconceptions about the old trade data being used in simulation internals like slippage and performance. The truth is this was simply a convenience we added for the algo API.
Hopefully things are at least starting to make sense. Let me answer your questions directly:
We aren't exactly forward filling. We're just not updating the entry for a stock that didn't trade in the bar. The date of the bar information distinguishes old bars from new ones in the same data parameter. That being said, I'd like to consider replacing old bars with zero volume bars with OHLC equal to the prior close (a degenerate bar). Especially if we get confirmation from other community members that this is preferred. Honestly, I like it because I increasingly feel the reiterated bar is a "lie" about the current state of the algo's world.
It would, but that's why we have an event based simulation that processes a single event at a time. Missing data is literally and figuratively a non-event. Orders are only filled starting with the first bar after the order is placed.
Good question. The answer is no. More generally, if none of the securities in your universe trade in a given minute, handle_data is not called for that minute. In the internals, we've found that zero events in a given minute makes coding significantly more difficult. We could consider invoking handle_data every minute the market is open now matter what. We would send a degenerate bar (as described in #1 above) on every minute, for any stock that didn't trade. Thoughts?
You are right that our description isn't precise. get_datetime returns the time in simulation at which handle_data was called. The return of get_datetime is exactly equal to max([trade.datetime for trade in data.itervalues()]) -- i.e. it is the timestamp of the event that triggered this call to handle_data. It is entirely possible that several minutes or days elapse between calls to handle_data, and this would be fully reflected in both the datetime properties of the trade bars and by get_datetime.
Not right now, no. As I explained above, the aggregation of trades into the data parameter has no effect on order submission or filling.
The slippage model doesn't receive the aggregated data paremeter, it receives individual events. Orders will remain unfilled until real trades are sent to the slippage model.
Thanks again for taking the time to ask these questions. I hope I've answered them sufficiently, and I look forward to more feedback!
thanks,
fawce