Notebook

Analyzing a Signal and Creating a Contest Algorithm with Self-Serve Data

Self-Serve Data is a new feature on Quantopian that allows you to bring your own data to Quantopian and access it directly in research and the IDE via Pipeline.

Integration with Pipeline means you can upload a signal to the platform and use it with the rest of the tools in the Quantopian ecosystem like Alphalens and Pyfolio. Accesing your data in Pipeline also means you can use your signal in an algorithm and submit it to the contest and by extension, the allocation process.

This notebook will explore how you can leverage Self-Serve Data to do the following:

  1. Analyze an Uploaded Signal Using Alphalens.
  2. Use Your Data in the Quantopian Contest.

If you haven't already done so, download the campaign contributions dataset, upload it to your account, and name the dataset campaign_contributions. For guidance, check out section 'II. Upload Via Self-Serve' of the notebook in this forum post.

1. Analyze an Uploaded Signal Using Alphalens

I. Construct a Pipeline Factor with Uploaded Data

After a dataset is processed, it will have a corresponding information page found at https://www.quantopian.com/data/user_[user_ID]/[dataset_name]. In this case, the page can be found at https://www.quantopian.com/data/user_[user_ID]/campaign_contributions. The information page gives us a sample Pipeline into which your data is already incorporated as a factor. This example pipeline contains a factor called 'my_dataset'.

Data Page

Pipeline Factors can be combined both with other Factors and with scalar values via any of the builtin mathematical operators (+, -, *, etc). Factors created from uploaded data work the same way.

To show this, let's make the following changes to this sample Pipeline and analyze a combined factor called 'Score':

  1. Construct a 'Score' factor of the 6-month simple moving average of campaign contributions, passing contributions count into the built-in SimpleMovingAverage Factor.
  2. Create another Factor column in the Pipeline called 'Sector' that maps each asset to its corresponding sector code. Use the free Fundamentals dataset to obtain sector codes.

Let's create this Pipeline:

In [1]:
# First, import your uploaded dataset:
## from quantopian.pipeline.data.user_[user_ID] import [dataset name]
from quantopian.pipeline.data.user_[user_ID] import campaign_contributions
In [2]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import Fundamentals
from quantopian.research import run_pipeline
from quantopian.pipeline.factors import BusinessDaysSincePreviousEvent, SimpleMovingAverage
from quantopian.pipeline.filters import QTradableStocksUS
import pandas as pd
In [3]:
# Set up the Pipeline:
def make_pipeline():
    
    base_universe = QTradableStocksUS()
    
    # Factor for number of business days since data was last updated.
    days_since_last_update = BusinessDaysSincePreviousEvent(
        inputs=[campaign_contributions.asof_date.latest]
    )
    
    # A pipeline screen that ensures uploaded data are not 3 or more business days old.
    has_recent_update = (days_since_last_update < 3)
    universe = (has_recent_update & base_universe)
    
    # Factor for 6-month simple moving average of campaign contributions count:
    score = SimpleMovingAverage(
        inputs=[campaign_contributions.count], 
        window_length= 252/2
    )
    
    # Factor for sector code:
    sector = Fundamentals.morningstar_sector_code.latest
    
    # Filter out NaNs and 0s
    screen_null = score.notnull()
    screen_zeros = (score != 0.0)
    
    pipe = Pipeline(
        columns={
            'Score': score,
            'Sector': sector
        }, 
        screen= screen_null & screen_zeros & universe
    )
    
    return pipe
In [4]:
# Define a time range over which to run the pipeline:
start_date = '2017-01-01'
end_date = '2017-12-31'
results = run_pipeline(make_pipeline(), start_date, end_date)
In [5]:
results.head()
Out[5]:
Score Sector
2017-02-02 00:00:00+00:00 Equity(5763 [FCFS]) 0.045455 103
2017-02-03 00:00:00+00:00 Equity(5763 [FCFS]) 0.043478 103
Equity(28378 [AAWW]) 0.043478 310
2017-02-06 00:00:00+00:00 Equity(5763 [FCFS]) 0.041667 103
Equity(28378 [AAWW]) 0.041667 310

II. Analyze the Factor

We'll now analyze the 'Score' factor with Alphalens, an open-source tool for analyzing the predictive ability of alpha factors. For more information on Alphalens, check out the Factor Analysis lecture.

The following is the typical workflow for analyzing a Factor with Alphalens:

In [6]:
# Import Alphalens:
import alphalens as al
In [7]:
# Retrieve list of assets from Pipeline output:
asset_list = results.index.levels[1].unique()

# Define time range over which to analyze the factor over:
start_date = '2017-01-01'
end_date = '2018-05-31'
In [8]:
# Obtain pricing information on list of assets for input to alphalens:
prices = get_pricing(
    asset_list,
    start_date=start_date,
    end_date=end_date,
    fields='open_price'
)
In [9]:
# Define sector labels for factor analysis grouping:
MORNINGSTAR_SECTOR_CODES = {
     -1: 'Misc',
    101: 'Basic Materials',
    102: 'Consumer Cyclical',
    103: 'Financial Services',
    104: 'Real Estate',
    205: 'Consumer Defensive',
    206: 'Healthcare',
    207: 'Utilities',
    308: 'Communication Services',
    309: 'Energy',
    310: 'Industrials',
    311: 'Technology' ,    
}
In [10]:
# First, we get our factor categorized by sector code and calculate our forward returns. 
# The forward returns are the returns that we would have received for holding each security over 
# the day periods ending on the given date, passed in through the periods parameter.
factor_data = al.utils.get_clean_factor_and_forward_returns(
    results['Score'],
    prices=prices,
    groupby=results['Sector'],
    binning_by_group=True,
    groupby_labels=MORNINGSTAR_SECTOR_CODES,
    quantiles=5,
    periods=(10, 21, 63)
)
Dropped 4.5% entries from factor data: 0.0% in forward returns computation and 4.5% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
In [11]:
# Use composed factor data to create full tearsheet
al.tears.create_full_tear_sheet(factor_data, by_group=True);
Quantiles Statistics
min max mean std count count %
factor_quantile
1.0 0.007937 0.214286 0.022151 0.019603 12293 23.837040
2.0 0.015873 0.473684 0.065137 0.041250 9495 18.411510
3.0 0.017241 0.617647 0.140479 0.080439 9560 18.537550
4.0 0.055556 1.934211 0.298883 0.194850 9687 18.783813
5.0 0.033708 7.000000 0.910428 0.882403 10536 20.430087
Returns Analysis
10D 21D 63D
Ann. alpha 0.044 0.031 0.013
beta -0.078 -0.080 -0.063
Mean Period Wise Return Top Quantile (bps) 17.372 18.977 18.645
Mean Period Wise Return Bottom Quantile (bps) -6.080 -5.884 -4.402
Mean Period Wise Spread (bps) 24.212 19.979 14.147
<matplotlib.figure.Figure at 0x7f21c123a210>
Information Analysis
10D 21D 63D
IC Mean 0.038 0.036 0.021
IC Std. 0.181 0.165 0.155
Risk-Adjusted IC 0.210 0.217 0.135
t-stat(IC) 3.168 3.267 2.038
p-value(IC) 0.002 0.001 0.043
IC Skew 2.656 3.264 -4.597
IC Kurtosis 11.237 16.191 26.068
Turnover Analysis
10D 21D 63D
Quantile 1 Mean Turnover 0.223 0.343 0.602
Quantile 2 Mean Turnover 0.299 0.438 0.664
Quantile 3 Mean Turnover 0.322 0.487 0.732
Quantile 4 Mean Turnover 0.296 0.448 0.639
Quantile 5 Mean Turnover 0.185 0.272 0.442
10D 21D 63D
Mean Factor Rank Autocorrelation 0.969 0.94 0.877

So, what did we learn?

In the Alphalens tearsheet above which analyzes our 'Score' Factor, we notice mediocre projected returns for all quintiles (Period Wise Return by Factor Quantile plot). Since this factor is determined by the moving average count of campaign contributions, perhaps we can improve this factor by incorporating more information into the factor to increase its predictive value.

For a more detailed review of this factor, refer to the notebook attached to this post by Lucy Wu.

2. Use Your Data in the Quantopian Contest

Self-Serve Data allows you to both upload historical data as well as live-update your data on a regular basis. Because of this live-updating capability, your data can be used in algorithms you submit to the daily Quantopian Contest.

Set up live uploads

When you add a self-serve dataset to your account, you are asked if you want to set up a nightly update process for the dataset:

Set Up Live Data

If you want to send live updates to your dataset, you need to establish an FTP or host a file somewhere (like Dropbox or Google Sheets) and keep it up to date. Files are checked for new data on a nightly basis. You can read more about live updating datasets in the help documentation.

If a live connection is setup, the file posted at the host will be downloaded overnight after each trading day, from 7-10AM UTC (Tue-Fri), and compared against existing dataset records. You can learn more about how this works by reading through the Self-Serve Data - How Does It Work? notebook.

Use your data in a contest algorithm

Once you've uploaded your dataset and configured live updates, clone the template algorithm in this thread. Follow the TO-DOs to incorporate your data and develop the algorithm.

Refer to the Writing a Contest Algorithm tutorial to learn more about the contest criteria. Use the notebook in Lesson 11 to test whether your algorithm meets all of the criteria.