Hello all too,
I don't understand necessity of data alignment and why you want not to have a "rolling OHLCV Dataframe" for each timeframe / symbol
I may be wrong but this is how I would consider this problem
I will first in initialize method create variables to manage history
Let's say that each rolling DataFrame will have 65535 points history depth and that my strategy need to store 1 minute timeframe and a daily timeframe for both EURUSD and EURCHF
Internal history object could be a dict like this:
{
'EURUSD': {
'bid': value,
'ask': value,
'last': value,
'last_vol': value,
'last_type': ['buy'|'sell']
'tick': <rolling_tick_data>
'OHLCV': {
'1Min': {
'bid': <rolling_ohlc_data>,
'ask': <rolling_ohlc_data>
}
'5Min': {
...
},
...
'15Min': {
...
},
}
,
'EURCHF': {
'bid': value,
'ask': value,
'last': value,
'last_vol': value,
'last_type': ['buy'|'sell']
'tick': <rolling_tick_data>
'OHLCV': {
'1Min': {
'bid': <rolling_ohlc_data>,
'ask': <rolling_ohlc_data>
}
'5Min': {
...
},
...
'15Min': {
...
},
}
}
Initialize function will contains lines like:
N = 65535
data.history.store('EURUSD') # will create 'EURUSD' key into history dict and every other sub-keys
data.history.create_tick_buffer('EURUSD', N)
data.history.create_ohlcv_buffer('EURUSD', '1Min', N)
data.history.create_ohlcv_buffer('EURUSD', '1D', N)
data.history.store('EURCHF')
data.history.create_tick_buffer('EURCHF', N)
data.history.create_ohlcv_buffer('EURCHF', '1Min', N)
data.history.create_ohlcv_buffer('EURCHF', '1D', N)
I really think that it's necessary to store both not compressed data (such as tick data) and compressed data (like OHLCV)
because you can't rely on resampling big ticks data into 'handle_data' method
Do you resample tick data or even 1 minute timeframe data to get OHLCV daily data 7 days before ?
If you are doing this it will probably be very long and also very memory consuming !
An implementation of this RollingOHLCV could be:
(it should probably be improved)
class RollingOHLCVData():
def __init__(self, N):
self.size = N
self.df = pd.DataFrame(index=np.arange(N-1,-1,-1))
self.df['open'] = np.nan
self.df['high'] = np.nan
self.df['low'] = np.nan
self.df['close'] = np.nan
self.df['volume'] = np.nan
self.flag_first_append = True
def append(self, price, volume, new_candle=False):
if new_candle or self.flag_first_append:
print("new_candle")
self.flag_first_append = False
self.df = self.df.shift(-1)
self.df['open'][0] = price
self.df['high'][0] = price
self.df['low'][0] = price
self.df['close'][0] = price
self.df['volume'][0] = volume
else:
self.df['close'][0] = price
if price > self.df['high'][0]:
self.df['high'][0] = price
if price < self.df['low'][0]:
self.df['low'][0] = price
self.df['volume'][0] += volume
def __repr__(self):
s = self.df.to_string()
return(s)
Into handle data we could access to price
like data.history['EURUSD']['OHLCV']['1Min']['close'][0]
you could also have access to k bar backward data.history['EURUSD']['1Min']['close'][k]
What should happen if no tick happens during 1 minute ?
I think we should keep previous close bar price and set volume to 0
(we could also add a flag column into the rolling dataframe for that case)
That's just an idea...
An other idea (but maybe after implementing what I wrote before) could be
use a kind of "lazy" structure which will be created depending of what you need.
for example if you receive a tick data for EURUSD and a source code line into
handle_data of your strategy requests 2 bars before current bar open price
zipline could say "Oh my dear!!! I haven't store this data... Let's build a structure
able to store what they are requesting me! It will be ready for next event!"
In such a case maybe accessing data using '[' ']' is not the best idea and we should provide
an accessor for that.... maybe we could also overload operator [] using itemgetter
W4C