Hi Viridian,
Your question about weekend data is a good one. The way that pipeline bins its data is actually more like 'sessions' rather than 'days' (despite the fact that we call them 'days' everywhere). Each session goes from 45 minutes before market open on one trading day (called the 'pipeline cutoff time') to 45 minutes before market open on the next day. For example, if you run a pipeline on a Monday and you have a factor with a window length of 1 (i.e. .latest
), the pipeline will retrieve data for the latest session, which is actually the previous Friday at 8:45am (assuming US_EQUITIES
domain) to the Monday at 8:45am. So if your factor is using something like daily close price, it will get the most recent pricing data from the aforementioned session, which will be the close price on Friday. It's worth noting that each element in the lookback window arrays you get in a pipeline factor corresponds to one of these sessions. This is why you only see 5 elements per week.
Things get a little more interesting if you use a dataset that is expected to have records dated on the weekend. There aren't many examples like that on Q, but the sentiment datasets and the Insider Transactions dataset are two examples of this. I'll start by focusing on the sentiment datasets because the Insider Transactions dataset gets a little more complicated!
The sentiment datasets sometimes get sentiment scores on the weekend. Just like with every other dataset, pipeline slots every data point into a 'session'. However, when there are multiple data points in one session, Pipeline always surfaces the one with the most recent asof_date
. If an asset has 3 sentiment scores come in with asof_date
s on Friday, Saturday, and Sunday, the Sunday score will be slotted into the array for the Friday @ 8:45am --> Monday @ 8:45am session.
For Insider Transactions, we do things a little differently since the dataset is actually implemented as a DataSetFamily
. The best way to learn about how weekend data is handled in the case of Insider Transactions is to read the bottom section of the notebook posted in this thread (titled "Calendar days vs. trading days"). The fact that there could be multiple transactions per day for a single asset meant we needed a new type of API to enable people to aggregate the data into a single value per asset per day, since that is the format that pipeline computations expect as input.
Looking ahead a bit, we'd like to add support to pipeline to make it possible to express custom aggregations over data that doesn't naturally fit into the single value per asset per day model (really, this should be defined as single value per asset per session). I don't know exactly when this will happen, and to be honest, I think in future datasets rather than those that are already integrated, but I figured it was worth mentioning now to show that this is a problem we're thinking about, but haven't solved yet! I'm curious, is there a particular dataset you are looking at where you expect the custom aggregation of weekend data to be helpful?
Thanks for the great question!
Regarding your original observation, the pipeline cutoff times I mentioned above are when data for the session are 'locked in'. For instance, if you try to run a pipeline on Saturday, the most recent session is the Friday @ 8:45am --> Monday @ 8:45am session which is still current. It's possible (and in some cases, likely) that new data will come in for that session. After 8:45am on Monday, any new data will go into the next session. To be honest, I'm surprised that you were able to run a pipeline on Saturday with an end date on the next Monday (I'm assuming 'next' here based on your observation, let me know if my assumption is wrong). We should probably fire a warning if you run a pipeline in a session that hasn't ended yet to avoid non-deterministic results as you observed. I'll file an issue that the current behavior is confusing. Thanks for reporting it.
Let me know if any of the above was confusing or if you have any further questions. It's a tricky topic!