This is the second post in a two-part series. For a walkthrough of the dataset construction, see the Data Cleaning notebook.
Thesis
Companies generally do not make major changes to their 10-K and 10-Q filings. When they do, it is predictive of significant underperformance in the next quarter. We find alpha in shorting the companies with the largest text changes in their filings and buying the companies with the smallest text changes in their filings.
Background
Publicly listed companies in the U.S. are required by law to file "10-K" and "10-Q" reports with the Securities and Exchange Commission (SEC). These reports provide both qualitative and quantitative descriptions of the company's performance, from revenue numbers to qualitative risk factors.
When companies file 10-Ks and 10-Qs, they are required to disclose certain pieces of information. For example, companies are required to report information about "significant pending lawsuits or other legal proceedings". As such, 10-Ks and 10-Qs often hold valuable insights into a company's performance.
These insights, however, can be difficult to access. The average 10-K was 42,000 words long in 2013; put in perspective, that's roughly one-fifth of the length of Moby-Dick. Beyond the sheer length, dense language and lots of boilerplate can further obfuscate true meaning for many investors.
The good news? We might not need to read companies' 10-Ks and 10-Qs from cover-to-cover in order derive value from the information they contain. Specifically, Lauren Cohen, Christopher Malloy and Quoc Nguyen argue in their recent paper that we can simply analyze textual changes in 10-Ks and 10-Qs to predict companies' future stock returns. (For an overview of this paper from Lauren Cohen himself, see the Lazy Prices interview from QuantCon 2018.)
In this investigation, we attempt to replicate their results on Quantopian.
The notebook attached below presents an analysis of the "similarity score" alpha factor. We begin by retrieving our alpha factor data with Pipeline; after transforming the data into the appropriate format for Alphalens, we run a full tearsheet that details mean returns, turnover, Normality, and more.