Hi @Thomas, thank you for your feedback! To answer your questions:
How easy is it to use these scripts for different data sources on Quandl and outside of Quandl?
If you looked at the aws-latest
or local-only
branch, it was initially made to combine short interest and pricing data. I'd like to eventually turn this tool into an ultimate data-collector for different data sources. Another that came to mind was scraping EDGAR data. Is there already this kind of data somewhere or should I attempt to build it?
To add different data sources, you'd just need to add another DAG with a similar structure to short_analysis_dag
and reuse and update the combine_dag
found in the other branches I mentioned above.
There is not too much information on what to do with the Quantopian file after it was created. Do you plan to add more docs around that? A NB that runs on Quantopian to read the data in and do some simple analysis? I think that would go a long way.
I created this tool as a part of my upcoming online course on Python for Finance. That notebook was already created, but currently as a part of the course. I shall like to build this kind of document sometimes, but right now the deadline for this course is extremely tight (have to push for Beta in April - therefore lots of caffeine and lack of sleep).
Is it possible to close the loop from the automatic daily downloading of the data from Quandl to the automatic updating of the self-serve data on Quantopian?
I hope I understand this question properly, and please do let me know if I don't, but did you mean just leave the system humming and the Self-Serve data on Quantopian should be updated automatically? The system is doing just that, it updates a CSV file in an S3 server every 00:00 and Quantopian would pull it in the day through the Live Data feature (thank you for building this feature, by the way; it's a wet dream comes true!!). The reason in the documentation I talked about turning off and on the EC2 server is for cost-saving. I should probably have done it through AWS Lambda and Step Functions instead of Apache Airflow, but I am currently more comfortable in using the latter. Myself, I only turn the system online whenever I need to use it in Quantopian (again, for cost-saving).
I am currently working on this project only part-time but were it to become a full(er) time project I have a couple of interesting features in mind to improve the workflow for including new data sources flexibly.