Building the Foundations for Hypothesis Testing

Hi Quantopian Community,

My name is Matt and I'm a research analyst here at Quantopian. My goal is to get compelling and accessible research into the hands of our community. So without further ado, here's the first (of many) posts. This post specifically focuses on how to generate data for testing hypotheses.

In many cases, gathering data is the hardest part of running an analysis. Understanding what sample selection, universe, and variables you're working with will set the foundation for the rest of your research. So whether you've been following the Quantpedia Series or want to learn how to conduct your own research, this tutorial will show you how to use Pipeline to extract the data you need.

The motivation for this post came from a research paper that I'm currently analyzing here at Quantopian. This research is an OOS implementation of Milian's paper, "Overreacting to a History of Underreaction" where Milian examines the well-known Post Earnings Announcement Drift Effect (PEAD for short). There, he suggests that the PEAD has been reversed in past years due to the overcrowding of arbitrageours invested in PEAD strategies. He finds that firms providing the biggest positive earnings announcement surprise are the ones that had significant negative returns shortly after the subsequent earnings announcement.

While the results of my OOS implementation will soon be published in the Quantpedia Series, this post takes you through the exact steps I used to meet the data requirements necessary to perform my analysis. Specifically, I show you how to:

Run and query large batches of data through Pipeline
Filter the Pipeline for specific time frames (corporate actions, earnings announcements, etc.)
Use the Pipeline to generate forward looking returns
Categorize securities into deciles based off previous earnings surprise per calendar quarter

Clone the notebook to get started.

Matt,

When attempting to run:

positions_data = split_run_pipeline(positions_pipeline, START, END, SPLITS)

I'm getting a ValueError, that I can't seem to reconcile. Any idea what I can do to address this error?

Here's the error in all of its glory:

ValueError: Bad response: Computation failed with message:
AssertionError:
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/server/server.py", line 643, in compserver
**compute_kwargs),
File "/home/databazaar/.venv/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in call
return func(args, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/core.py", line 412, in compute
result = top_then_bottom_then_top_again_etc(expr3, d4, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/core.py", line 189, in top_then_bottom_then_top_again_etc
return top_then_bottom_then_top_again_etc(expr3, scope4, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/core.py", line 189, in top_then_bottom_then_top_again_etc
return top_then_bottom_then_top_again_etc(expr3, scope4, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/core.py", line 153, in top_then_bottom_then_top_again_etc
return compute_down(expr, *leaf_data, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in call
return func(*args, **kwargs)
File "/home/databazaar/.venv/src/databazaar/databazaar/utils/throttler.py", line 95, in compute_throttler
return_type='native',
File "/home/databazaar/.venv/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in call
return func(args, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/core.py", line 412, in compute
result = top_then_bottom_then_top_again_etc(expr3, d4, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/core.py", line 158, in top_then_bottom_then_top_again_etc
expr2, scope2 = bottom_up_until_type_break(expr, scope, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/core.py", line 301, in bottom_up_until_type_break
for i in inputs])
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/core.py", line 301, in
for i in inputs])
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/core.py", line 325, in bottom_up_until_type_break
**kwargs)}
File "/home/databazaar/.venv/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in call
return func(*args, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/sql.py", line 1038, in compute_up
assert names == expr._child.fields