In this notebook, we will use the political campaign contributions data processed earlier to construct an alpha factor, then inspect the performance of that alpha factor using Alphalens.

1. Get Data from Data Upload¶

In this step, we import the data from a local .csv file via the Self-Serve Data feature. To do this, we begin with the local .csv file generated by the Data Processing notebook. We then upload it under the Self-Serve Data tab on the Account > Data page. The Ticker column is our Primary Asset, the Date column is our Primary Date, and the Count/Sum columns are number types.

The campaign contributions dataset should now appear in your list of datasets. Click on the dataset name to view the exact statement you'll need to import your dataset; keep in mind that this will be different for each user based on your user ID and dataset name.

For more on importing data using Self-Serve, check out the examples in this forum post.

# Import data
#from quantopian.interactive.data.user_5b042c4ed2a9dc00498b7d14 import campcontribs2018 as contribs_interactive # Uncomment this to import data to view
from quantopian.pipeline.data.user_5b042c4ed2a9dc00498b7d14 import contribs2018 as contribs # for pipeline

2. Get Alpha Factor Data¶

The data we uploaded is in a tabular form, with one row per asset per day. However, Alphalens requires that we provide data in a specific format with specific labels. Fortunately, Pipeline will do all the "dirty work" for us.

In this step, we'll use Pipeline to put our data in a form that can be ingested by Alphalens.

from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline import filters
from quantopian.pipeline import factors
import pandas as pd

# Set up Pipeline
def make_pipeline():
    
    #score = contribs.count.latest.zscore()
    # weight per day
    score = factors.SimpleMovingAverage(
        inputs=[contribs.count], 
        window_length= 252/2
    )
    
    # Filter out NaNs and 0s
    screen_null = score.notnull()
    screen_zeros = (score != 0.0)
    
    contribs_pipe = Pipeline(
        columns={'Score': score}, 
        screen= screen_null & screen_zeros
    )
    
    return contribs_pipe

start_date = '2017-04-01'
end_date = '2018-02-10'
contribs_output = run_pipeline(make_pipeline(), start_date, end_date)

contribs_output.head()

3. Get Pricing Data¶

Since an alpha factor is supposed to predict the performance of an asset, we'll need to get records of the actual performance of the asset in order to examine the performance of our alpha factor. In this step, we get pricing data for the assets in our dataset.

# Get list of relevant assets
assets = contribs_output.index.levels[1]

# Get pricing data for those assets
pricing_end_date = '2018-07-01' # Pricing end date should be later so we can get forward returns
prices = get_pricing(assets,
                     start_date=start_date,
                     end_date=pricing_end_date,
                     fields='open_price')

4. Run Alphalens¶

Now that we have both our alpha factor and pricing datasets, we're ready to run our Alphalens study.

import alphalens

Before creating a tearsheet, we'll use get_clean_factor_and_forward_returns to get our data in the correct format to be ingested by Alphalens.

Note on parameters:

The periods parameter in get_clean_factor_and_forward_returns allows us to set the periods over which we assess the performance of our alpha factor (in days). Here, we'll use longer periods, since political processes tend to be longer-term phenomena.

The quantiles parameter allows us to set the number of bins into which we divide our assets based on their factor values. Here, we'll use a smaller number of bins because our data doesn't take a very wide range of values.

factor_data = alphalens.utils.get_clean_factor_and_forward_returns(
    contribs_output,
    prices=prices,
    quantiles=3,
    periods=(20, 60, 120)
)

Dropped 17.0% entries from factor data: 17.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

Now, let's take a look at the tearsheet! (Since this is so computationally expensive, it usually takes a few minutes to run.)

alphalens.tears.create_full_tear_sheet(factor_data, by_group=False);

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7f6200cb3090>

Information Analysis

Turnover Analysis

A few notes about this tearsheet:

In the "Cumulative Return by Quantile" plots, we want to see 3 distinct "fingers" moving across the plot without crossing. In the middle period (62D), we see that the 3rd quantile is consistently distinct from the first and second. This indicates that stocks with high factor values tend to generate higher returns over this period (good!). However, the first and second quantiles cross a couple times, which indicates that our factor doesn't do a good job of identifying the lowest-returning stocks. The third quantile also does not seem to perform as well in the 21D and 124D periods.
In the "IC Normal Dist Q-Q" plots, we want to see an S-shaped curve that indicates a Normal distribution with fat tails (since high/low factor values are the stocks that we want to long/short). We see slightly S-shaped curves in the plots for all three periods.
The mean turnover looks a little on the higher side, especially for the two longer periods.

In general, it looks like there's still some work to be done here. We could modify this factor in a few ways:

Improve our data. As noted in the Data Processing notebook, there is some uncertainty in the way that we map political campaign contributions to tickers. Removing some of this uncertainty would result in higher-quality data, which could potentially improve our alpha factor.
Improve our factor construction. Currently, we use the rolling-60-day-sum of the count and sum of political campaign contributions. However, there are many other factors we could generate from our political campaign contributions data. For example, we could create a factor that incorporates both count and sum of contributions, or somehow incorporate the identity of the politial candidate to which the company is donating into our metric.

5. Conclusion¶

In this exploration, we took raw data from the FEC and used it to construct an alpha factor. Then, we ran an Alphalens study on our factor. Ultimately, it looks like our factor has potential, but there's still some work to be done before incorporating it into a full strategy!

For more on analyzing factors with Alphalens, check out this lecture; for more on Self-Serve data, check out this forum post.

		Score
2017-04-03 00:00:00+00:00	Equity(2 [ARNC])	0.031746
	Equity(41 [ARCB])	0.015873
	Equity(62 [ABT])	0.015873
	Equity(128 [ADM])	0.079365
	Equity(161 [AEP])	0.095238

	min	max	mean	std	count	count %
factor_quantile
1	0.007937	0.078947	0.026276	0.015971	18641	34.748812
2	0.055556	0.263158	0.121814	0.047253	17434	32.498835
3	0.138889	7.000000	0.695829	0.730547	17570	32.752353

	21D	62D	124D
Ann. alpha	0.053	0.060	-0.034
beta	-0.161	-0.174	0.258
Mean Period Wise Return Top Quantile (bps)	14.667	18.110	6.105
Mean Period Wise Return Bottom Quantile (bps)	25.722	9.521	22.557
Mean Period Wise Spread (bps)	-0.665	11.284	-9.362

	21D	62D	124D
IC Mean	0.027	0.055	0.031
IC Std.	0.088	0.048	0.053
Risk-Adjusted IC	0.309	1.137	0.591
t-stat(IC)	4.215	15.508	8.057
p-value(IC)	0.000	0.000	0.000
IC Skew	0.017	-0.630	-0.157
IC Kurtosis	-0.682	-0.451	-0.761

	124D	21D	62D
Quantile 1 Mean Turnover	0.762	0.195	0.460
Quantile 2 Mean Turnover	0.719	0.237	0.471
Quantile 3 Mean Turnover	0.521	0.145	0.303