Hi.
I read both of these well written explanations of self-serve data and live data.
https://www.quantopian.com/posts/quantopian-partner-data-how-is-it-collected-processed-and-surfaced
https://www.quantopian.com/help#self_serve_data
Alas, it's not 100% clear to me how my contest algos should expect to receive and process new(future) data when it becomes available. Perhaps it was the part about base tables and deltas that threw me off.
As an example. Let's say I have a dataset setup correctly in my datasets, it's linked to a published google sheets .csv(as per the docs) and my algo knows how to grab the historical data using pipeline.
The dataset looks like this:
date,symbol,my_value
5/31/2018,GCP,1
5/31/2018,EXPE,2
5/31/2018,PMT,3
5/31/2018,DOW,4
5/31/2018,SHO,5
5/31/2018,INGR,6
5/31/2018,QGEN,7
5/31/2018,AWK,8
Now, my algo goes live in the contest. It's 06/25/2018. At the end of this month I update the data file with similar, but different data.
For example:
date,symbol,my_value
6/31/2018,GCP,5
6/31/2018,EXPE,9
6/31/2018,IBM,3
6/31/2018,CAT,4
6/31/2018,TWTR,5
6/31/2018,INGR,1
6/31/2018,QGEN,17
6/31/2018,AWK,82
My first question would be how should the new data be arranged in the data file? Should the new data rows be appended to the bottom of the existing list? Can I just replace the old data with the new data? Does it matter?
On 07/01/2018 the algo ingest the updated data. What will come through the pipeline?
In any case, is it correct to say that the 'date' column will remain paired to its corresponding 'symbol' and 'my_value' like a regular row. If so, I can just sort the data on the 'date' column newest to oldest and do a groupby(etc.) to get the most recent batch of data.
Am I on the right track?
Thanks in advance.
Bryan