Thanks Thomas,
In addition to the help docs, examples of all of the use cases would be helpful, since there are lots of "bells and whistles" here:
def __init__(self,
func=None,
refresh_period=None,
window_length=None,
clean_nans=True,
sids=None,
fields=None,
create_panel=True,
compute_only_full=True):
"""Instantiate new batch_transform object.
:Arguments:
func : python function <optional>
If supplied will be called after each refresh_period
with the data panel and all args and kwargs supplied
to the handle_data() call.
refresh_period : int
Interval to call batch_transform function.
window_length : int
How many days the trailing window should have.
clean_nans : bool <default=True>
Whether to (forward) fill in nans.
sids : list <optional>
Which sids to include in the moving window. If not
supplied sids will be extracted from incoming
events.
fields : list <optional>
Which fields to include in the moving window
(e.g. 'price'). If not supplied, fields will be
extracted from incoming events.
create_panel : bool <default=True>
If True, will create a pandas panel every refresh
period and pass it to the user-defined function.
If False, will pass the underlying deque reference
directly to the function which will be significantly
faster.
compute_only_full : bool <default=True>
Only call the user-defined function once the window is
full. Returns None if window is not full yet.
"""
Specifically, I'm interested in passing the deque reference directly to vectorized numpy functions. Presumably, the clean_nans function works regardless of the other switch settings, correct?
Just curious..."under the hood" does the backtester load all of the data required to run the backtest into memory prior to executing (e.g. into a numpy array or similar). If so, then it should be very efficient to index over the array and pass-by-reference, right?. Or does the backtester need to stream data from a hard drive?
Grant