Hi Quantopian Community,
My name is Matt and I'm a research analyst here at Quantopian. My goal is to get compelling and accessible research into the hands of our community. So without further ado, here's the first (of many) posts. This post specifically focuses on how to generate data for testing hypotheses.
In many cases, gathering data is the hardest part of running an analysis. Understanding what sample selection, universe, and variables you're working with will set the foundation for the rest of your research. So whether you've been following the Quantpedia Series or want to learn how to conduct your own research, this tutorial will show you how to use Pipeline to extract the data you need.
The motivation for this post came from a research paper that I'm currently analyzing here at Quantopian. This research is an OOS implementation of Milian's paper, "Overreacting to a History of Underreaction" where Milian examines the well-known Post Earnings Announcement Drift Effect (PEAD for short). There, he suggests that the PEAD has been reversed in past years due to the overcrowding of arbitrageours invested in PEAD strategies. He finds that firms providing the biggest positive earnings announcement surprise are the ones that had significant negative returns shortly after the subsequent earnings announcement.
While the results of my OOS implementation will soon be published in the Quantpedia Series, this post takes you through the exact steps I used to meet the data requirements necessary to perform my analysis. Specifically, I show you how to:
- Run and query large batches of data through Pipeline
- Filter the Pipeline for specific time frames (corporate actions, earnings announcements, etc.)
- Use the Pipeline to generate forward looking returns
- Categorize securities into deciles based off previous earnings surprise per calendar quarter
Clone the notebook to get started.