In this notebook, we will use the political campaign contributions data processed earlier to construct an alpha factor, then inspect the performance of that alpha factor using Alphalens.
In this step, we import the data from a local .csv file via the Self-Serve Data feature. To do this, we begin with the local .csv file generated by the Data Processing notebook. We then upload it under the Self-Serve Data tab on the Account > Data page. The Ticker
column is our Primary Asset, the Date
column is our Primary Date, and the Count
/Sum
columns are number types.
The campaign contributions dataset should now appear in your list of datasets. Click on the dataset name to view the exact statement you'll need to import your dataset; keep in mind that this will be different for each user based on your user ID and dataset name.
For more on importing data using Self-Serve, check out the examples in this forum post.
# Import data
#from quantopian.interactive.data.user_5b042c4ed2a9dc00498b7d14 import campcontribs2018 as contribs_interactive # Uncomment this to import data to view
from quantopian.pipeline.data.user_5b042c4ed2a9dc00498b7d14 import contribs2018 as contribs # for pipeline
The data we uploaded is in a tabular form, with one row per asset per day. However, Alphalens requires that we provide data in a specific format with specific labels. Fortunately, Pipeline will do all the "dirty work" for us.
In this step, we'll use Pipeline to put our data in a form that can be ingested by Alphalens.
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline import filters
from quantopian.pipeline import factors
import pandas as pd
# Set up Pipeline
def make_pipeline():
#score = contribs.count.latest.zscore()
# weight per day
score = factors.SimpleMovingAverage(
inputs=[contribs.count],
window_length= 252/2
)
# Filter out NaNs and 0s
screen_null = score.notnull()
screen_zeros = (score != 0.0)
contribs_pipe = Pipeline(
columns={'Score': score},
screen= screen_null & screen_zeros
)
return contribs_pipe
start_date = '2017-04-01'
end_date = '2018-02-10'
contribs_output = run_pipeline(make_pipeline(), start_date, end_date)
contribs_output.head()
Since an alpha factor is supposed to predict the performance of an asset, we'll need to get records of the actual performance of the asset in order to examine the performance of our alpha factor. In this step, we get pricing data for the assets in our dataset.
# Get list of relevant assets
assets = contribs_output.index.levels[1]
# Get pricing data for those assets
pricing_end_date = '2018-07-01' # Pricing end date should be later so we can get forward returns
prices = get_pricing(assets,
start_date=start_date,
end_date=pricing_end_date,
fields='open_price')
Now that we have both our alpha factor and pricing datasets, we're ready to run our Alphalens study.
import alphalens
Before creating a tearsheet, we'll use get_clean_factor_and_forward_returns
to get our data in the correct format to be ingested by Alphalens.
Note on parameters:
The periods
parameter in get_clean_factor_and_forward_returns
allows us to set the periods over which we assess the performance of our alpha factor (in days). Here, we'll use longer periods, since political processes tend to be longer-term phenomena.
The quantiles
parameter allows us to set the number of bins into which we divide our assets based on their factor values. Here, we'll use a smaller number of bins because our data doesn't take a very wide range of values.
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(
contribs_output,
prices=prices,
quantiles=3,
periods=(20, 60, 120)
)
Now, let's take a look at the tearsheet! (Since this is so computationally expensive, it usually takes a few minutes to run.)
alphalens.tears.create_full_tear_sheet(factor_data, by_group=False);
A few notes about this tearsheet:
In general, it looks like there's still some work to be done here. We could modify this factor in a few ways:
In this exploration, we took raw data from the FEC and used it to construct an alpha factor. Then, we ran an Alphalens study on our factor. Ultimately, it looks like our factor has potential, but there's still some work to be done before incorporating it into a full strategy!
For more on analyzing factors with Alphalens, check out this lecture; for more on Self-Serve data, check out this forum post.