Analyzing Alpha in 10-Ks and 10-Qs (Alphalens Study)¶

THESIS:¶

Major text changes in 10-K and 10-Q filings over time indicate significant decreases in future returns. We find alpha in shorting the companies with the largest text changes in their filings and buying the companies with the smallest text changes in their filings.

Publicly listed companies in the U.S. are required by law to file "10-K" and "10-Q" reports with the Securities and Exchange Commission (SEC). These reports provide both qualitative and quantitative descriptions of the company's performance, from revenue numbers to qualitative risk factors.

When companies file 10-Ks and 10-Qs, they are required to disclose certain pieces of information. For example, companies are required to report information about "significant pending lawsuits or other legal proceedings". As such, 10-Ks and 10-Qs often hold valuable insights into a company's performance.

These insights, however, can be difficult to access. The average 10-K was 42,000 words long in 2013; put in perspective, that's roughly one-fifth of the length of Moby-Dick. Beyond the sheer length, dense language and lots of boilerplate can further obfuscate true meaning for many investors.

The good news? We might not need to read companies' 10-Ks and 10-Qs from cover-to-cover in order derive value from the information they contain. Specifically, Lauren Cohen, Christopher Malloy and Quoc Nguyen argue in their recent paper that we can simply analyze textual changes in 10-Ks and 10-Qs to predict companies' future stock returns. For an overview of this paper from one of the authors, see the Lazy Prices interview from QuantCon 2018.

To understand how the dataset used in this post was created, be sure to see the Data Processing notebook.

from quantopian.research import run_pipeline
from quantopian.pipeline import Pipeline
from quantopian.pipeline.filters import QTradableStocksUS
import alphalens

1. Loading Data from Self-Serve Data¶

In this step, we import the data from a local .csv file via the Self-Serve Data feature.

To do this, we begin with the local .csv file generated by the Data Processing notebook. We then upload it under the Self-Serve Data tab on the Account > Data page; this makes it available for import into a research notebook or pipeline.

For more on importing data using Self-Serve, check out the examples in this forum post.

from quantopian.pipeline.data.user_5b102ae91141120040958556 import lazyprices3_90d

2. Formatting Factor Values¶

The data we uploaded is in a tabular form, with one row per asset per day. However, Alphalens requires that we provide data in a specific format with specific labels. Fortunately, Pipeline will do all the "dirty work" for us.

In this step, we'll use Pipeline to put our data in a form that can be ingested by Alphalens.

def make_pipeline():
    
    jaccard_score = lazyprices3_90d.jaccard_score.latest
    cosine_score = lazyprices3_90d.cosine_score.latest
    
    screen = (QTradableStocksUS() & jaccard_score.notnull() & cosine_score.notnull())
    
    return Pipeline(columns={'jaccard_score': jaccard_score, 'cosine_score': cosine_score}, screen=screen)

data = run_pipeline(make_pipeline(), '2013-01-01', '2018-05-01')

3. Get Pricing Data¶

Since an alpha factor is supposed to predict the returns of an asset, we'll need to get records of the actual price of the asset in order to examine the performance of our alpha factor. In this step, we get pricing data for the assets in our dataset.

# Get list of relevant assets
assets = data.index.levels[1]

# Get pricing data for those assets
pricing_end_date = '2018-08-01' # Pricing end date should be later so we can get forward returns
prices = get_pricing(assets,
                     start_date='2013-01-01',
                     end_date=pricing_end_date,
                     fields='open_price')

4. Run Alphalens¶

Now that we have both our alpha factor and pricing datasets, we're ready to run our Alphalens study.

Since we have both Jaccard and cosine similarity scores, we'll run two separate Alphalens tearsheets.

4a. Jaccard Similarity Factor¶

Before creating a tearsheet, we'll use get_clean_factor_and_forward_returns to get our data in the correct format to be ingested by Alphalens.

Note on parameters:

The periods parameter in get_clean_factor_and_forward_returns allows us to set the periods over which we assess the performance of our alpha factor (in days). Here, we'll use longer periods, since political processes tend to be longer-term phenomena.

The quantiles parameter allows us to set the number of bins into which we divide our assets based on their factor values. Since the original paper uses 5 quantiles to estimate portfolio performance, we'll also use 5 quantiles.

jaccard_factor = data[['jaccard_score']]

Shorter periods (1, 5, 10 days)¶

factor_data_j1 = alphalens.utils.get_clean_factor_and_forward_returns(
    jaccard_factor,
    prices=prices,
    quantiles=5,
    periods =(1, 5, 10),
)

Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

alphalens.tears.create_full_tear_sheet(factor_data_j1, by_group=False);

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7f37cc37da50>

Information Analysis

Turnover Analysis

Midrange periods (1, 2, 3 months)¶

factor_data_j2 = alphalens.utils.get_clean_factor_and_forward_returns(
    jaccard_factor,
    prices=prices,
    quantiles=5,
    periods =(20, 40, 60),
)

Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

alphalens.tears.create_full_tear_sheet(factor_data_j2, by_group=False);

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7f37c48eb090>

Information Analysis

Turnover Analysis

Longest periods (1.5, 3, 4.5 months)¶

factor_data_j3 = alphalens.utils.get_clean_factor_and_forward_returns(
    jaccard_factor,
    prices=prices,
    quantiles=5,
    periods =(30, 60, 90),
)

Dropped 3.4% entries from factor data: 3.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

alphalens.tears.create_full_tear_sheet(factor_data_j3, by_group=False);

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7f37b133af50>

Information Analysis

Turnover Analysis

4b. Cosine Similarity Factor¶

We'll put our cosine score factor through the same process.

cosine_factor = data[['cosine_score']]

factor_data_c1 = alphalens.utils.get_clean_factor_and_forward_returns(
    cosine_factor,
    prices=prices,
    quantiles=5,
    periods =(1, 5, 10),
)

Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

alphalens.tears.create_full_tear_sheet(factor_data_c1, by_group=False);

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7f37ba7b07d0>

Information Analysis

Turnover Analysis

Midrange periods (1, 2, 3 months)¶

factor_data_c2 = alphalens.utils.get_clean_factor_and_forward_returns(
    cosine_factor,
    prices=prices,
    quantiles=5,
    periods =(20, 40, 60),
)

Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

alphalens.tears.create_full_tear_sheet(factor_data_c2, by_group=False);

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7f37c4af6a90>

Information Analysis

Turnover Analysis

Longest periods (1.5, 3, 4.5 months)¶

factor_data_c3 = alphalens.utils.get_clean_factor_and_forward_returns(
    cosine_factor,
    prices=prices,
    quantiles=5,
    periods =(30, 60, 90),
)

Dropped 3.4% entries from factor data: 3.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

alphalens.tears.create_full_tear_sheet(factor_data_c3, by_group=False);

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7ff2db51ca10>

Information Analysis

Turnover Analysis

A few notes about this tearsheet:

In the "Cumulative Return by Quantile" plots, we want to see the top and bottom quantile "fingers" move across the plot without crossing. It looks like they're significantly different over all periods, indicating that our factor is doing a good job of separating high- and low-returning stocks.
In the "IC Normal Dist Q-Q" plots, we want to see an S-shaped curve that indicates a Normal distribution with fat tails (since high/low factor values are the stocks that we want to long/short). We do see reasonably S-shaped curves in the plots over all periods.
For our top and bottom quantile, the mean turnover looks reasonable -- hovering around roughly 30-40%, which is well within the contest guideline of 5-65%.

How does this compare to the paper's findings? The original paper found a spread of 31 bps in excess return between the 1st and 5th quantile for the cosine similarity score over a three-month holding period, and a spread of 53 bps for the Jaccard similarity score.

Keep in mind that the mean period wise return calculated by Alphalens is the rate of return. As such, it's difficult to compare the Alphalens result exactly with the original result. However, we do see a spread somewhere around 20-50 bps between quantiles (depending on the factor and period), so it seems like our results are generally in-line with the paper's findings.

The next step? Put it in an algorithm and see how it performs in real-world conditions.

	min	max	mean	std	count	count %
factor_quantile
1	0.225926	0.745363	0.625227	0.058582	294577	20.023097
2	0.659586	0.797164	0.704151	0.030120	294080	19.989315
3	0.705064	0.831345	0.745011	0.029979	294068	19.988499
4	0.745190	0.861789	0.784330	0.028935	294075	19.988975
5	0.785714	1.000000	0.840791	0.031515	294386	20.010114

	1D	5D	10D
Ann. alpha	0.023	0.023	0.022
beta	-0.004	-0.023	-0.027
Mean Period Wise Return Top Quantile (bps)	1.338	1.161	1.015
Mean Period Wise Return Bottom Quantile (bps)	-0.791	-0.695	-0.704
Mean Period Wise Spread (bps)	2.170	1.873	1.723

	1D	5D	10D
IC Mean	0.004	0.007	0.009
IC Std.	0.041	0.042	0.042
Risk-Adjusted IC	0.106	0.161	0.208
t-stat(IC)	3.058	4.667	6.032
p-value(IC)	0.002	0.000	0.000
IC Skew	-0.054	0.018	-0.109
IC Kurtosis	0.180	-0.193	-0.262

	10D	1D	5D
Quantile 1 Mean Turnover	0.096	0.013	0.054
Quantile 2 Mean Turnover	0.151	0.022	0.089
Quantile 3 Mean Turnover	0.164	0.025	0.101
Quantile 4 Mean Turnover	0.154	0.023	0.094
Quantile 5 Mean Turnover	0.108	0.015	0.062

	min	max	mean	std	count	count %
factor_quantile
1	0.225926	0.745363	0.625249	0.058598	294497	20.021905
2	0.659586	0.797164	0.704165	0.030122	294021	19.989544
3	0.705731	0.831345	0.745027	0.029980	294014	19.989068
4	0.745230	0.861789	0.784344	0.028932	294017	19.989272
5	0.785714	1.000000	0.840801	0.031511	294325	20.010212

	20D	40D	60D
Ann. alpha	0.025	0.026	0.026
beta	-0.040	-0.057	-0.060
Mean Period Wise Return Top Quantile (bps)	18.899	19.663	18.713
Mean Period Wise Return Bottom Quantile (bps)	-15.490	-14.392	-14.657
Mean Period Wise Spread (bps)	34.396	33.724	33.079

	20D	40D	60D
IC Mean	0.012	0.018	0.021
IC Std.	0.043	0.039	0.039
Risk-Adjusted IC	0.285	0.446	0.540
t-stat(IC)	8.263	12.909	15.636
p-value(IC)	0.000	0.000	0.000
IC Skew	-0.100	0.217	-0.176
IC Kurtosis	-0.324	-0.155	-0.116

	20D	40D	60D
Quantile 1 Mean Turnover	0.170	0.298	0.378
Quantile 2 Mean Turnover	0.250	0.413	0.516
Quantile 3 Mean Turnover	0.265	0.437	0.547
Quantile 4 Mean Turnover	0.249	0.409	0.507
Quantile 5 Mean Turnover	0.188	0.328	0.411

	30D	60D	90D
Quantile 1 Mean Turnover	0.245	0.390	0.455
Quantile 2 Mean Turnover	0.347	0.533	0.601
Quantile 3 Mean Turnover	0.368	0.564	0.629
Quantile 4 Mean Turnover	0.345	0.523	0.587
Quantile 5 Mean Turnover	0.270	0.424	0.475

	min	max	mean	std	count	count %
factor_quantile
1	0.430380	0.854624	0.769615	0.044902	294577	20.023097
2	0.796210	0.887623	0.826529	0.020148	294075	19.988975
3	0.827838	0.907965	0.853852	0.019162	294075	19.988975
4	0.854240	0.925837	0.879042	0.017803	294073	19.988839
5	0.880126	1.000000	0.913292	0.018437	294386	20.010114

	10D	1D	5D
Quantile 1 Mean Turnover	0.095	0.013	0.054
Quantile 2 Mean Turnover	0.151	0.022	0.089
Quantile 3 Mean Turnover	0.164	0.025	0.101
Quantile 4 Mean Turnover	0.154	0.023	0.094
Quantile 5 Mean Turnover	0.108	0.015	0.062

	20D	40D	60D
Ann. alpha	0.026	0.027	0.026
beta	-0.042	-0.058	-0.061
Mean Period Wise Return Top Quantile (bps)	18.365	19.264	18.333
Mean Period Wise Return Bottom Quantile (bps)	-16.370	-14.340	-14.254
Mean Period Wise Spread (bps)	34.889	33.359	32.353

	30D	60D	90D
Quantile 1 Mean Turnover	0.244	0.390	0.455
Quantile 2 Mean Turnover	0.347	0.531	0.602
Quantile 3 Mean Turnover	0.368	0.564	0.629
Quantile 4 Mean Turnover	0.345	0.523	0.587
Quantile 5 Mean Turnover	0.270	0.424	0.475

	min	max	mean	std	count	count %
factor_quantile
1	0.225926	0.745363	0.625607	0.058730	284462	20.022637
2	0.659586	0.797164	0.704506	0.030495	283987	19.989203
3	0.705731	0.831345	0.745427	0.030366	283983	19.988921
4	0.745230	0.861789	0.784718	0.029259	283983	19.988921
5	0.785714	1.000000	0.841041	0.031631	284287	20.010319

	min	max	mean	std	count	count %
factor_quantile
1	0.430380	0.854624	0.769891	0.045011	284462	20.022637
2	0.796210	0.887623	0.826763	0.020395	283983	19.988921
3	0.827915	0.907965	0.854116	0.019408	283989	19.989343
4	0.854252	0.925837	0.879279	0.018000	283981	19.988780
5	0.880126	1.000000	0.913437	0.018503	284287	20.010319