Analyzing a Signal and Creating a Contest Algorithm with Self-Serve Data¶

Self-Serve Data is a new feature on Quantopian that allows you to bring your own data to Quantopian and access it directly in research and the IDE via Pipeline.

Integration with Pipeline means you can upload a signal to the platform and use it with the rest of the tools in the Quantopian ecosystem like Alphalens and Pyfolio. Accesing your data in Pipeline also means you can use your signal in an algorithm and submit it to the contest and by extension, the allocation process.

This notebook will explore how you can leverage Self-Serve Data to do the following:

Analyze an Uploaded Signal Using Alphalens.
Use Your Data in the Quantopian Contest.

If you haven't already done so, download the campaign contributions dataset, upload it to your account, and name the dataset campaign_contributions. For guidance, check out section 'II. Upload Via Self-Serve' of the notebook in this forum post.

1. Analyze an Uploaded Signal Using Alphalens¶

I. Construct a Pipeline Factor with Uploaded Data¶

After a dataset is processed, it will have a corresponding information page found at https://www.quantopian.com/data/user_[user_ID]/[dataset_name]. In this case, the page can be found at https://www.quantopian.com/data/user_[user_ID]/campaign_contributions. The information page gives us a sample Pipeline into which your data is already incorporated as a factor. This example pipeline contains a factor called 'my_dataset'.

Data Page

Pipeline Factors can be combined both with other Factors and with scalar values via any of the builtin mathematical operators (+, -, *, etc). Factors created from uploaded data work the same way.

To show this, let's make the following changes to this sample Pipeline and analyze a combined factor called 'Score':

Construct a 'Score' factor of the 6-month simple moving average of campaign contributions, passing contributions count into the built-in SimpleMovingAverage Factor.
Create another Factor column in the Pipeline called 'Sector' that maps each asset to its corresponding sector code. Use the free Fundamentals dataset to obtain sector codes.

Let's create this Pipeline:

# First, import your uploaded dataset:
## from quantopian.pipeline.data.user_[user_ID] import [dataset name]
from quantopian.pipeline.data.user_[user_ID] import campaign_contributions

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import Fundamentals
from quantopian.research import run_pipeline
from quantopian.pipeline.factors import BusinessDaysSincePreviousEvent, SimpleMovingAverage
from quantopian.pipeline.filters import QTradableStocksUS
import pandas as pd

# Set up the Pipeline:
def make_pipeline():
    
    base_universe = QTradableStocksUS()
    
    # Factor for number of business days since data was last updated.
    days_since_last_update = BusinessDaysSincePreviousEvent(
        inputs=[campaign_contributions.asof_date.latest]
    )
    
    # A pipeline screen that ensures uploaded data are not 3 or more business days old.
    has_recent_update = (days_since_last_update < 3)
    universe = (has_recent_update & base_universe)
    
    # Factor for 6-month simple moving average of campaign contributions count:
    score = SimpleMovingAverage(
        inputs=[campaign_contributions.count], 
        window_length= 252/2
    )
    
    # Factor for sector code:
    sector = Fundamentals.morningstar_sector_code.latest
    
    # Filter out NaNs and 0s
    screen_null = score.notnull()
    screen_zeros = (score != 0.0)
    
    pipe = Pipeline(
        columns={
            'Score': score,
            'Sector': sector
        }, 
        screen= screen_null & screen_zeros & universe
    )
    
    return pipe

# Define a time range over which to run the pipeline:
start_date = '2017-01-01'
end_date = '2017-12-31'
results = run_pipeline(make_pipeline(), start_date, end_date)

results.head()

II. Analyze the Factor¶

We'll now analyze the 'Score' factor with Alphalens, an open-source tool for analyzing the predictive ability of alpha factors. For more information on Alphalens, check out the Factor Analysis lecture.

The following is the typical workflow for analyzing a Factor with Alphalens:

# Import Alphalens:
import alphalens as al

# Retrieve list of assets from Pipeline output:
asset_list = results.index.levels[1].unique()

# Define time range over which to analyze the factor over:
start_date = '2017-01-01'
end_date = '2018-05-31'

# Obtain pricing information on list of assets for input to alphalens:
prices = get_pricing(
    asset_list,
    start_date=start_date,
    end_date=end_date,
    fields='open_price'
)

# Define sector labels for factor analysis grouping:
MORNINGSTAR_SECTOR_CODES = {
     -1: 'Misc',
    101: 'Basic Materials',
    102: 'Consumer Cyclical',
    103: 'Financial Services',
    104: 'Real Estate',
    205: 'Consumer Defensive',
    206: 'Healthcare',
    207: 'Utilities',
    308: 'Communication Services',
    309: 'Energy',
    310: 'Industrials',
    311: 'Technology' ,    
}

# First, we get our factor categorized by sector code and calculate our forward returns. 
# The forward returns are the returns that we would have received for holding each security over 
# the day periods ending on the given date, passed in through the periods parameter.
factor_data = al.utils.get_clean_factor_and_forward_returns(
    results['Score'],
    prices=prices,
    groupby=results['Sector'],
    binning_by_group=True,
    groupby_labels=MORNINGSTAR_SECTOR_CODES,
    quantiles=5,
    periods=(10, 21, 63)
)

Dropped 4.5% entries from factor data: 0.0% in forward returns computation and 4.5% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

# Use composed factor data to create full tearsheet
al.tears.create_full_tear_sheet(factor_data, by_group=True);

Quantiles Statistics

Returns Analysis

<matplotlib.figure.Figure at 0x7f21c123a210>

Information Analysis

Turnover Analysis

So, what did we learn?¶

In the Alphalens tearsheet above which analyzes our 'Score' Factor, we notice mediocre projected returns for all quintiles (Period Wise Return by Factor Quantile plot). Since this factor is determined by the moving average count of campaign contributions, perhaps we can improve this factor by incorporating more information into the factor to increase its predictive value.

For a more detailed review of this factor, refer to the notebook attached to this post by Lucy Wu.

2. Use Your Data in the Quantopian Contest¶

Self-Serve Data allows you to both upload historical data as well as live-update your data on a regular basis. Because of this live-updating capability, your data can be used in algorithms you submit to the daily Quantopian Contest.

Set up live uploads¶

When you add a self-serve dataset to your account, you are asked if you want to set up a nightly update process for the dataset:

Set Up Live Data

If you want to send live updates to your dataset, you need to establish an FTP or host a file somewhere (like Dropbox or Google Sheets) and keep it up to date. Files are checked for new data on a nightly basis. You can read more about live updating datasets in the help documentation.

If a live connection is setup, the file posted at the host will be downloaded overnight after each trading day, from 7-10AM UTC (Tue-Fri), and compared against existing dataset records. You can learn more about how this works by reading through the Self-Serve Data - How Does It Work? notebook.

Use your data in a contest algorithm¶

Once you've uploaded your dataset and configured live updates, clone the template algorithm in this thread. Follow the TO-DOs to incorporate your data and develop the algorithm.

Refer to the Writing a Contest Algorithm tutorial to learn more about the contest criteria. Use the notebook in Lesson 11 to test whether your algorithm meets all of the criteria.

		Score	Sector
2017-02-02 00:00:00+00:00	Equity(5763 [FCFS])	0.045455	103
2017-02-03 00:00:00+00:00	Equity(5763 [FCFS])	0.043478	103
2017-02-03 00:00:00+00:00	Equity(28378 [AAWW])	0.043478	310
2017-02-06 00:00:00+00:00	Equity(5763 [FCFS])	0.041667	103
2017-02-06 00:00:00+00:00	Equity(28378 [AAWW])	0.041667	310

	min	max	mean	std	count	count %
factor_quantile
1.0	0.007937	0.214286	0.022151	0.019603	12293	23.837040
2.0	0.015873	0.473684	0.065137	0.041250	9495	18.411510
3.0	0.017241	0.617647	0.140479	0.080439	9560	18.537550
4.0	0.055556	1.934211	0.298883	0.194850	9687	18.783813
5.0	0.033708	7.000000	0.910428	0.882403	10536	20.430087

	10D	21D	63D
Ann. alpha	0.044	0.031	0.013
beta	-0.078	-0.080	-0.063
Mean Period Wise Return Top Quantile (bps)	17.372	18.977	18.645
Mean Period Wise Return Bottom Quantile (bps)	-6.080	-5.884	-4.402
Mean Period Wise Spread (bps)	24.212	19.979	14.147

	10D	21D	63D
IC Mean	0.038	0.036	0.021
IC Std.	0.181	0.165	0.155
Risk-Adjusted IC	0.210	0.217	0.135
t-stat(IC)	3.168	3.267	2.038
p-value(IC)	0.002	0.001	0.043
IC Skew	2.656	3.264	-4.597
IC Kurtosis	11.237	16.191	26.068

	10D	21D	63D
Quantile 1 Mean Turnover	0.223	0.343	0.602
Quantile 2 Mean Turnover	0.299	0.438	0.664
Quantile 3 Mean Turnover	0.322	0.487	0.732
Quantile 4 Mean Turnover	0.296	0.448	0.639
Quantile 5 Mean Turnover	0.185	0.272	0.442