Within the "Working With Multiple Data Frequencies" section of the docs, it notes
When pulling in external data, you need to be careful about the data
frequency to prevent look-ahead bias. If you are fetching daily data,
but are running a minutely-based algorithm, the daily row will be
fetched at the beginning of the day, instead of the end of day. To
guard against the bias, you need to use the post_func function.
I'm trying to accurately back test an algo which fetches a CSV with signal data. The CSV has several rows per date (not time-granular), including historical data. When live trading, another system will add a few rows of data just before midnight Eastern Time. This means that the next day, the algo will grab the CSV and only have data up through the previous day. For example, when the CSV gets updated by the external system on 7/28/2015 at 11:58pm EST, it will make rows with a datestamp "7/28/2015" in the CSV rows. Tomorrow, 7/**29**/2015, the algo will get the CSV and only see signal data up to the 28th.
When back testing, the algo will have signal data available for each trading day - not just all the days up to the trading day. My question is: When back testing (minutely, not daily), does each trading day use CSV day up to and including the rows with the current trading day's date? Or only up to but not including the trading date?
The "Using Fetcher to create a custom universe" section of the docs implies each day's custom universe will be autofilled from past data if no data is available for the trading day. In live trading, this will be the case ever day (only yesterday's data is available). In back testing, there are no missing days of data, so I'm not sure how exactly look-ahead bias is prevented.