Notebook

In this notebook, we will use the political campaign contributions data processed earlier to construct an alpha factor, then inspect the performance of that alpha factor using Alphalens.

1. Get Data from Data Upload

In this step, we import the data from a local .csv file via the Self-Serve Data feature. To do this, we begin with the local .csv file generated by the Data Processing notebook. We then upload it under the Self-Serve Data tab on the Account > Data page. The Ticker column is our Primary Asset, the Date column is our Primary Date, and the Count/Sum columns are number types.

The campaign contributions dataset should now appear in your list of datasets. Click on the dataset name to view the exact statement you'll need to import your dataset; keep in mind that this will be different for each user based on your user ID and dataset name.

For more on importing data using Self-Serve, check out the examples in this forum post.

In [1]:
# Import data
#from quantopian.interactive.data.user_5b042c4ed2a9dc00498b7d14 import campcontribs2018 as contribs_interactive # Uncomment this to import data to view
from quantopian.pipeline.data.user_5b042c4ed2a9dc00498b7d14 import contribs2018 as contribs # for pipeline

2. Get Alpha Factor Data

The data we uploaded is in a tabular form, with one row per asset per day. However, Alphalens requires that we provide data in a specific format with specific labels. Fortunately, Pipeline will do all the "dirty work" for us.

In this step, we'll use Pipeline to put our data in a form that can be ingested by Alphalens.

In [2]:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline import filters
from quantopian.pipeline import factors
import pandas as pd
In [3]:
# Set up Pipeline
def make_pipeline():
    
    #score = contribs.count.latest.zscore()
    # weight per day
    score = factors.SimpleMovingAverage(
        inputs=[contribs.count], 
        window_length= 252/2
    )
    
    # Filter out NaNs and 0s
    screen_null = score.notnull()
    screen_zeros = (score != 0.0)
    
    contribs_pipe = Pipeline(
        columns={'Score': score}, 
        screen= screen_null & screen_zeros
    )
    
    return contribs_pipe
In [4]:
start_date = '2017-04-01'
end_date = '2018-02-10'
contribs_output = run_pipeline(make_pipeline(), start_date, end_date)
In [5]:
contribs_output.head()
Out[5]:
Score
2017-04-03 00:00:00+00:00 Equity(2 [ARNC]) 0.031746
Equity(41 [ARCB]) 0.015873
Equity(62 [ABT]) 0.015873
Equity(128 [ADM]) 0.079365
Equity(161 [AEP]) 0.095238

3. Get Pricing Data

Since an alpha factor is supposed to predict the performance of an asset, we'll need to get records of the actual performance of the asset in order to examine the performance of our alpha factor. In this step, we get pricing data for the assets in our dataset.

In [6]:
# Get list of relevant assets
assets = contribs_output.index.levels[1]
In [7]:
# Get pricing data for those assets
pricing_end_date = '2018-07-01' # Pricing end date should be later so we can get forward returns
prices = get_pricing(assets,
                     start_date=start_date,
                     end_date=pricing_end_date,
                     fields='open_price')

4. Run Alphalens

Now that we have both our alpha factor and pricing datasets, we're ready to run our Alphalens study.

In [8]:
import alphalens

Before creating a tearsheet, we'll use get_clean_factor_and_forward_returns to get our data in the correct format to be ingested by Alphalens.

Note on parameters:

The periods parameter in get_clean_factor_and_forward_returns allows us to set the periods over which we assess the performance of our alpha factor (in days). Here, we'll use longer periods, since political processes tend to be longer-term phenomena.

The quantiles parameter allows us to set the number of bins into which we divide our assets based on their factor values. Here, we'll use a smaller number of bins because our data doesn't take a very wide range of values.

In [9]:
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(
    contribs_output,
    prices=prices,
    quantiles=3,
    periods=(20, 60, 120)
)
Dropped 17.0% entries from factor data: 17.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

Now, let's take a look at the tearsheet! (Since this is so computationally expensive, it usually takes a few minutes to run.)

In [10]:
alphalens.tears.create_full_tear_sheet(factor_data, by_group=False);
Quantiles Statistics
min max mean std count count %
factor_quantile
1 0.007937 0.078947 0.026276 0.015971 18641 34.748812
2 0.055556 0.263158 0.121814 0.047253 17434 32.498835
3 0.138889 7.000000 0.695829 0.730547 17570 32.752353
Returns Analysis
21D 62D 124D
Ann. alpha 0.053 0.060 -0.034
beta -0.161 -0.174 0.258
Mean Period Wise Return Top Quantile (bps) 14.667 18.110 6.105
Mean Period Wise Return Bottom Quantile (bps) 25.722 9.521 22.557
Mean Period Wise Spread (bps) -0.665 11.284 -9.362
<matplotlib.figure.Figure at 0x7f6200cb3090>
Information Analysis
21D 62D 124D
IC Mean 0.027 0.055 0.031
IC Std. 0.088 0.048 0.053
Risk-Adjusted IC 0.309 1.137 0.591
t-stat(IC) 4.215 15.508 8.057
p-value(IC) 0.000 0.000 0.000
IC Skew 0.017 -0.630 -0.157
IC Kurtosis -0.682 -0.451 -0.761
Turnover Analysis
124D 21D 62D
Quantile 1 Mean Turnover 0.762 0.195 0.460
Quantile 2 Mean Turnover 0.719 0.237 0.471
Quantile 3 Mean Turnover 0.521 0.145 0.303
21D 62D 124D
Mean Factor Rank Autocorrelation 0.961 0.905 0.723

A few notes about this tearsheet:

  • In the "Cumulative Return by Quantile" plots, we want to see 3 distinct "fingers" moving across the plot without crossing. In the middle period (62D), we see that the 3rd quantile is consistently distinct from the first and second. This indicates that stocks with high factor values tend to generate higher returns over this period (good!). However, the first and second quantiles cross a couple times, which indicates that our factor doesn't do a good job of identifying the lowest-returning stocks. The third quantile also does not seem to perform as well in the 21D and 124D periods.
  • In the "IC Normal Dist Q-Q" plots, we want to see an S-shaped curve that indicates a Normal distribution with fat tails (since high/low factor values are the stocks that we want to long/short). We see slightly S-shaped curves in the plots for all three periods.
  • The mean turnover looks a little on the higher side, especially for the two longer periods.

In general, it looks like there's still some work to be done here. We could modify this factor in a few ways:

  • Improve our data. As noted in the Data Processing notebook, there is some uncertainty in the way that we map political campaign contributions to tickers. Removing some of this uncertainty would result in higher-quality data, which could potentially improve our alpha factor.
  • Improve our factor construction. Currently, we use the rolling-60-day-sum of the count and sum of political campaign contributions. However, there are many other factors we could generate from our political campaign contributions data. For example, we could create a factor that incorporates both count and sum of contributions, or somehow incorporate the identity of the politial candidate to which the company is donating into our metric.

5. Conclusion

In this exploration, we took raw data from the FEC and used it to construct an alpha factor. Then, we ran an Alphalens study on our factor. Ultimately, it looks like our factor has potential, but there's still some work to be done before incorporating it into a full strategy!

For more on analyzing factors with Alphalens, check out this lecture; for more on Self-Serve data, check out this forum post.