Notebook

Ranking Universes by Factors

By Delaney Granizo-Mackenzie and Gilbert Wassermann

Part of the Quantopian Lecture Series:

Notebook released under the Creative Commons Attribution 4.0 License. Please do not remove this attribution.

One common technique in quantitative finance is that of ranking stocks in some way. This ranking can be whatever you come up with, but will often be a combination of fundamental factors and price-based signals. One example could be the following

  1. Score stocks based on 0.5 x the PE Ratio of that stock + 0.5 x the 30 day price momentum
  2. Rank stocks based on that score

These ranking systems can be used to construct long-short equity strategies. The Long-Short Equity Lecture is recommended reading before this Lecture.

In order to develop a good ranking system, we need to first understand how to evaluate ranking systems. We will show a demo here.

WARNING:

This notebook does analysis over thousands of equities and hundreds of timepoints. The resulting memory usage can crash the research server if you are running other notebooks. Please shut down other notebooks in the main research menu before running this notebook. You can tell if other notebooks are running by checking the color of the notebook symbol. Green indicates running, grey indicates not.

In [1]:
import numpy as np
import statsmodels.api as sm
import scipy.stats as stats
import scipy
from statsmodels import regression
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

Getting Data

The first thing we're gonna do is get monthly values for the Market Cap, P/E Ratio and Monthly Returns for every equity. Monthly Returns is a metric that takes the returns accrued over an entire month of trading by dividing the last close price by the first close price and subtracting 1.

In [2]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import morningstar
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import CustomFactor, Returns
        
def make_pipeline():
    """
    Create and return our pipeline.
    
    We break this piece of logic out into its own function to make it easier to
    test and modify in isolation.
    """
    
    pipe = Pipeline(
        columns = {
            'Market Cap' : morningstar.valuation.market_cap.latest,
            'PE Ratio' : morningstar.valuation_ratios.pe_ratio.latest,
            'Monthly Returns': Returns(window_length=21),
        })
    
    return pipe

pipe = make_pipeline()

Let's take a look at the data to get a quick sense of what we have. This may take a while.

In [3]:
from quantopian.research import run_pipeline

start_date = '2014-01-01'
end_date = '2015-02-01'

data = run_pipeline(pipe, start_date, end_date)

# remove NaN values
data = data.dropna()

# show data
data
Out[3]:
Market Cap Monthly Returns PE Ratio
2014-01-02 00:00:00+00:00 Equity(2 [ARNC]) 1.027890e+10 0.112041 35.2113
Equity(21 [AAME]) 8.480670e+07 0.024938 9.0009
Equity(24 [AAPL]) 5.003170e+11 0.017885 13.9860
Equity(31 [ABAX]) 8.013110e+08 0.119128 42.9185
Equity(39 [DDC]) 1.144070e+09 0.062963 62.8931
Equity(52 [ABM]) 1.542840e+09 0.036242 21.4133
Equity(53 [ABMD]) 1.127980e+09 -0.047755 196.0784
Equity(62 [ABT]) 5.904880e+10 0.004980 9.0744
Equity(64 [ABX]) 1.920510e+10 0.133848 10.2987
Equity(66 [AB]) 2.040750e+09 -0.062390 16.3934
Equity(67 [ADSK]) 1.020840e+10 0.107638 41.6667
Equity(69 [ACAT]) 7.566230e+08 0.027968 18.5185
Equity(76 [TAP]) 9.683650e+09 0.068710 19.5695
Equity(84 [ACET]) 6.057550e+08 0.163660 20.7039
Equity(88 [ACI]) 8.661020e+08 0.052133 14.7493
Equity(99 [ACO]) 1.010940e+09 0.114501 44.4444
Equity(100 [IEP]) 1.378990e+10 -0.121558 16.3666
Equity(106 [ACU]) 4.711050e+07 0.017065 12.1212
Equity(110 [ACXM]) 2.499930e+09 0.125038 49.7512
Equity(112 [ACY]) 2.508720e+07 -0.025525 4.0193
Equity(114 [ADBE]) 2.817770e+10 0.054606 101.0101
Equity(117 [AEY]) 2.587410e+07 0.039011 15.5763
Equity(122 [ADI]) 1.499860e+10 0.059542 22.4719
Equity(128 [ADM]) 2.648330e+10 0.062668 21.4133
Equity(153 [AE]) 2.413310e+08 0.192967 8.3264
Equity(154 [AEM]) 4.784550e+09 0.032069 36.7647
Equity(157 [AEG]) 1.695770e+10 0.076050 16.3132
Equity(161 [AEP]) 2.293190e+10 0.002144 19.8413
Equity(162 [AEPI]) 2.888010e+08 0.065914 26.8817
Equity(166 [AES]) 1.081570e+10 0.002073 16.2338
... ... ... ... ...
2015-02-02 00:00:00+00:00 Equity(47858 [NMS]) 6.352770e+07 0.075804 10.6995
Equity(47873 [OMAM]) 1.854000e+09 -0.067692 74.7579
Equity(47875 [VBTX]) 1.237870e+08 -0.064286 18.4815
Equity(47876 [MOLG]) 1.302750e+08 -0.372549 29.9010
Equity(47883 [DPLO]) 1.336120e+09 -0.102920 171.6673
Equity(47884 [PLAY]) 1.146320e+09 0.051592 87.8222
Equity(47888 [FCAU]) 1.627660e+10 0.143104 14.2440
Equity(47894 [KIQ]) 2.109430e+08 -0.228376 42.2230
Equity(47898 [GWB]) 1.199980e+09 -0.112478 11.4336
Equity(47904 [SRSC]) 9.647810e+08 -0.030177 7.6246
Equity(47913 [XENE]) 2.444860e+08 -0.113924 28.1404
Equity(47921 [KEYS]) 5.652700e+09 -0.012718 14.3484
Equity(47923 [KE]) 3.170760e+08 -0.149042 14.2148
Equity(47935 [ABCW]) 3.094400e+08 0.008735 4.1449
Equity(47949 [APTO]) 5.733000e+07 -0.156303 3.3200
Equity(47980 [BOOT]) 5.036520e+08 0.108791 923.0552
Equity(48002 [FFWM]) 1.465090e+08 -0.017641 16.5062
Equity(48019 [TBK]) 2.346070e+08 -0.039882 9.1637
Equity(48090 [NDRM]) 1.966550e+08 -0.238976 0.1525
Equity(48091 [VA]) 1.559080e+09 -0.223353 17.8565
Equity(48103 [STOR]) 2.611000e+09 0.062500 44.1732
Equity(48124 [WF]) 5.441790e+09 -0.127706 11.2718
Equity(48126 [HABT]) 8.303110e+08 0.021336 47.1418
Equity(48129 [UBS]) 6.324900e+10 -0.021688 16.6622
Equity(48131 [NEFF]) 2.372420e+08 -0.176889 7.5635
Equity(48139 [CPHR]) 3.714800e+08 -0.046610 13.9360
Equity(48220 [LC]) 7.025730e+09 -0.257708 860.1009
Equity(48252 [AVOL]) 1.582620e+09 -0.010644 14.0303
Equity(48255 [MPG]) 1.234820e+09 0.053026 21.7141
Equity(48258 [JRVR]) 6.139030e+08 -0.063867 9.1169

1191654 rows × 3 columns

Now, we need to take each of these individual factors, clean them to remove NaN values and aggregate them for each month.

In [4]:
cap_data = data['Market Cap'].transpose().unstack() # extract series of data
cap_data = cap_data.T.dropna().T # remove NaN values
cap_data = cap_data.resample('M', how='last') # use last instance in month to aggregate

pe_data = data['PE Ratio'].transpose().unstack()
pe_data = pe_data.T.dropna().T
pe_data = pe_data.resample('M', how='last')

month_data = data['Monthly Returns'].transpose().unstack()
month_data = month_data.T.dropna().T
month_data = month_data.resample('M', how='last')

The next step is to figure out which equities we have data for. Data sources are never perfect, and stocks go in and out of existence with Mergers, Acquisitions, and Bankruptcies. We'll make a list of the stocks common to all three sources (our factor data sets) and then filter down both to just those stocks.

In [5]:
common_equities = cap_data.T.index.intersection(pe_data.T.index).intersection(month_data.T.index)

Now, we will make sure that each time series is being run over identical an identical set of securities.

In [6]:
cap_data_filtered = cap_data[common_equities][:-1]
month_forward_returns = month_data[common_equities][1:]
pe_data_filtered = pe_data[common_equities][:-1]

Here, is the filtered data for market cap over all equities for the first 5 months, as an example.

In [7]:
cap_data_filtered.head()
Out[7]:
Equity(2 [ARNC]) Equity(24 [AAPL]) Equity(31 [ABAX]) Equity(39 [DDC]) Equity(52 [ABM]) Equity(53 [ABMD]) Equity(62 [ABT]) Equity(64 [ABX]) Equity(66 [AB]) Equity(67 [ADSK]) ... Equity(45642 [SGM]) Equity(45643 [MGNX]) Equity(45645 [FTD]) Equity(45656 [GLPI]) Equity(45667 [VEEV]) Equity(45734 [COMM]) Equity(45818 [LGIH]) Equity(45840 [ABEV]) Equity(45867 [EROS]) Equity(45902 [WBAI])
2014-01-31 00:00:00+00:00 1.138480e+10 5.007410e+11 890067000.0 1.222390e+09 1.587410e+09 1.063340e+09 5.933860e+10 2.053280e+10 2.047110e+09 1.135200e+10 ... 438077000.0 6.906220e+08 612898000.0 4.105630e+09 3.999880e+09 3.518360e+09 319757000.0 7.328360e+10 484315000.0 6.471560e+08
2014-02-28 00:00:00+00:00 1.232730e+10 4.465300e+11 849092000.0 1.236860e+09 1.487790e+09 1.094600e+09 5.656900e+10 2.245450e+10 2.143040e+09 1.161840e+10 ... 390404000.0 1.002320e+09 583713000.0 3.076480e+09 3.999880e+09 3.332500e+09 361699000.0 1.024360e+11 409772000.0 1.376580e+09
2014-03-31 00:00:00+00:00 1.265200e+10 4.694000e+11 844236000.0 1.212170e+09 1.577000e+09 1.122470e+09 6.138330e+10 2.373560e+10 2.323390e+09 1.192270e+10 ... 374427000.0 9.614970e+08 587730000.0 3.376150e+09 4.405140e+09 4.502220e+09 352979000.0 1.127730e+11 568013000.0 1.456300e+09
2014-04-30 00:00:00+00:00 1.507210e+10 4.625330e+11 864924000.0 1.142370e+09 1.605490e+09 1.122470e+09 5.783970e+10 2.076610e+10 2.403060e+09 1.117300e+10 ... 382931000.0 7.653020e+08 587730000.0 3.376150e+09 3.332920e+09 4.595390e+09 358170000.0 1.160860e+11 700535000.0 1.213040e+09
2014-05-31 00:00:00+00:00 1.579070e+10 5.082920e+11 903410000.0 1.071720e+09 1.513320e+09 9.456180e+08 5.818510e+10 2.034680e+10 2.447330e+09 1.090940e+10 ... 325981000.0 5.507410e+08 574106000.0 4.107600e+09 2.397950e+09 4.968820e+09 305638000.0 1.135800e+11 697484000.0 1.042090e+09

5 rows × 3581 columns

Because we're dealing with ranking systems, at several points we're going to want to rank our data. Let's check how our data looks when ranked to get a sense for this.

In [8]:
cap_data_filtered.rank().head()
Out[8]:
Equity(2 [ARNC]) Equity(24 [AAPL]) Equity(31 [ABAX]) Equity(39 [DDC]) Equity(52 [ABM]) Equity(53 [ABMD]) Equity(62 [ABT]) Equity(64 [ABX]) Equity(66 [AB]) Equity(67 [ADSK]) ... Equity(45642 [SGM]) Equity(45643 [MGNX]) Equity(45645 [FTD]) Equity(45656 [GLPI]) Equity(45667 [VEEV]) Equity(45734 [COMM]) Equity(45818 [LGIH]) Equity(45840 [ABEV]) Equity(45867 [EROS]) Equity(45902 [WBAI])
2014-01-31 00:00:00+00:00 1.0 4.0 4.0 9.0 10.0 7.0 4.0 7.0 1.0 3.0 ... 13.0 7.0 7.0 12.0 11.5 2.0 4.0 1.0 2.0 2.0
2014-02-28 00:00:00+00:00 2.0 1.0 2.0 10.0 5.0 8.0 1.0 12.0 2.0 4.0 ... 12.0 13.0 4.0 1.0 11.5 1.0 8.0 4.0 1.0 12.0
2014-03-31 00:00:00+00:00 3.0 3.0 1.0 6.0 9.0 9.5 6.0 13.0 3.0 5.0 ... 9.0 11.0 5.5 3.5 13.0 8.0 6.0 9.0 3.0 13.0
2014-04-30 00:00:00+00:00 4.0 2.0 3.0 2.0 11.0 9.5 2.0 8.0 4.0 2.0 ... 11.0 8.0 5.5 3.5 5.0 10.0 7.0 13.0 5.0 9.0
2014-05-31 00:00:00+00:00 5.0 5.0 5.0 1.0 6.0 1.0 3.0 6.0 6.0 1.0 ... 5.0 2.0 2.0 13.0 1.0 12.0 2.0 10.0 4.0 6.0

5 rows × 3581 columns

Looking at Correlations Over Time

Now that we have the data, let's do something with it. Our first analysis will be to measure the monthly Spearman rank correlation coefficient between Market Cap and month-forward returns. In other words, how predictive of 30-day returns is ranking your universe by market cap.

In [9]:
cap_data_filtered.head()
Out[9]:
Equity(2 [ARNC]) Equity(24 [AAPL]) Equity(31 [ABAX]) Equity(39 [DDC]) Equity(52 [ABM]) Equity(53 [ABMD]) Equity(62 [ABT]) Equity(64 [ABX]) Equity(66 [AB]) Equity(67 [ADSK]) ... Equity(45642 [SGM]) Equity(45643 [MGNX]) Equity(45645 [FTD]) Equity(45656 [GLPI]) Equity(45667 [VEEV]) Equity(45734 [COMM]) Equity(45818 [LGIH]) Equity(45840 [ABEV]) Equity(45867 [EROS]) Equity(45902 [WBAI])
2014-01-31 00:00:00+00:00 1.138480e+10 5.007410e+11 890067000.0 1.222390e+09 1.587410e+09 1.063340e+09 5.933860e+10 2.053280e+10 2.047110e+09 1.135200e+10 ... 438077000.0 6.906220e+08 612898000.0 4.105630e+09 3.999880e+09 3.518360e+09 319757000.0 7.328360e+10 484315000.0 6.471560e+08
2014-02-28 00:00:00+00:00 1.232730e+10 4.465300e+11 849092000.0 1.236860e+09 1.487790e+09 1.094600e+09 5.656900e+10 2.245450e+10 2.143040e+09 1.161840e+10 ... 390404000.0 1.002320e+09 583713000.0 3.076480e+09 3.999880e+09 3.332500e+09 361699000.0 1.024360e+11 409772000.0 1.376580e+09
2014-03-31 00:00:00+00:00 1.265200e+10 4.694000e+11 844236000.0 1.212170e+09 1.577000e+09 1.122470e+09 6.138330e+10 2.373560e+10 2.323390e+09 1.192270e+10 ... 374427000.0 9.614970e+08 587730000.0 3.376150e+09 4.405140e+09 4.502220e+09 352979000.0 1.127730e+11 568013000.0 1.456300e+09
2014-04-30 00:00:00+00:00 1.507210e+10 4.625330e+11 864924000.0 1.142370e+09 1.605490e+09 1.122470e+09 5.783970e+10 2.076610e+10 2.403060e+09 1.117300e+10 ... 382931000.0 7.653020e+08 587730000.0 3.376150e+09 3.332920e+09 4.595390e+09 358170000.0 1.160860e+11 700535000.0 1.213040e+09
2014-05-31 00:00:00+00:00 1.579070e+10 5.082920e+11 903410000.0 1.071720e+09 1.513320e+09 9.456180e+08 5.818510e+10 2.034680e+10 2.447330e+09 1.090940e+10 ... 325981000.0 5.507410e+08 574106000.0 4.107600e+09 2.397950e+09 4.968820e+09 305638000.0 1.135800e+11 697484000.0 1.042090e+09

5 rows × 3581 columns

In [10]:
scores = np.zeros(13)
pvalues = np.zeros(13)
for i in range(13):
    score, pvalue = stats.spearmanr(cap_data_filtered.iloc[i], month_forward_returns.iloc[i])
    pvalues[i] = pvalue
    scores[i] = score
    
plt.bar(range(1,14),scores)
plt.hlines(np.mean(scores), 1, 14, colors='r', linestyles='dashed')
plt.xlabel('Month')
plt.xlim((1, 14))
plt.legend(['Mean Correlation over All Months', 'Monthly Rank Correlation'])
plt.ylabel('Rank correlation between Market Cap and 30-day forward returns');

We can see that the average correlation is positive, but varies a lot from month to month.

Let's look at the same analysis, but with PE Ratio.

In [11]:
scores = np.zeros(13)
pvalues = np.zeros(13)
for i in range(13):
    score, pvalue = stats.spearmanr(pe_data_filtered.iloc[i], month_forward_returns.iloc[i])
    pvalues[i] = pvalue
    scores[i] = score
    
plt.bar(range(1,14),scores)
plt.hlines(np.mean(scores), 1, 14, colors='r', linestyles='dashed')
plt.xlabel('Month')
plt.xlim((1, 14))
plt.legend(['Mean Correlation over All Months', 'Monthly Rank Correlation'])
plt.ylabel('Rank correlation between PE Ratio and 30-day forward returns');

The correlation of PE Ratio and 30-day returns seems to be near 0 on average. It's important to note that this monthly and between 2012 and 2015. Different factors are predictive on differen timeframes and frequencies, so the fact that PE Ratio doesn't appear predictive here is not necessary throwing it out as a useful factor. Beyond it's usefulness in predicting returns, it can be used for risk exposure analysis as discussed in the Factor Risk Exposure Lecture.

Basket Returns

The next step is to compute the returns of baskets taken out of our ranking. If we rank all equities and then split them into $n$ groups, what would the mean return be of each group? We can answer this question in the following way. The first step is to create a function that will give us the mean return in each basket in a given the month and a ranking factor.

In [12]:
def compute_basket_returns(factor_data, forward_returns, number_of_baskets, month):

    data = pd.concat([factor_data.iloc[month-1],forward_returns.iloc[month-1]], axis=1)
    # Rank the equities on the factor values
    data.columns = ['Factor Value', 'Month Forward Returns']
    data.sort('Factor Value', inplace=True)
    
    # How many equities per basket
    equities_per_basket = int(np.floor(len(data.index) / number_of_baskets))

    basket_returns = np.zeros(number_of_baskets)

    # Compute the returns of each basket
    for i in range(number_of_baskets):
        start = i * equities_per_basket
        if i == number_of_baskets - 1:
            # Handle having a few extra in the last basket when our number of equities doesn't divide well
            end = len(data.index) - 1
        else:
            end = i * equities_per_basket + equities_per_basket
        # Actually compute the mean returns for each basket
        basket_returns[i] = data.iloc[start:end]['Month Forward Returns'].mean()
        
    return basket_returns

The first thing we'll do with this function is compute this for each month and then average. This should give us a sense of the relationship over a long timeframe.

In [13]:
number_of_baskets = 10
mean_basket_returns = np.zeros(number_of_baskets)
for m in range(1, 14):
    basket_returns = compute_basket_returns(cap_data_filtered, month_forward_returns, number_of_baskets, m)
    mean_basket_returns += basket_returns

mean_basket_returns /= 13    

# Plot the returns of each basket
plt.bar(range(number_of_baskets), mean_basket_returns)
plt.ylabel('Returns')
plt.xlabel('Basket')
plt.legend(['Returns of Each Basket']);

Spread Consistency

Of course, that's just the average relationship. To get a sense of how consistent this is, and whether or not we would want to trade on it, we should look at it over time. Here we'll look at the monthly spreads for the year. We can see a lot of variation, and further analysis should be done to determine whether Market Cap is tradeable.

In [14]:
f, axarr = plt.subplots(3, 4)
for month in range(1, 13):
    basket_returns = compute_basket_returns(cap_data_filtered, month_forward_returns, 10, month)

    r = int(np.floor((month-1) / 4))
    c = int((month-1) % 4)
    axarr[r, c].bar(range(number_of_baskets), basket_returns)
    axarr[r, c].xaxis.set_visible(False) # Hide the axis lables so the plots aren't super messy
    axarr[r, c].set_title('Month ' + str(month))

We'll repeat the same analysis for PE Ratio.

In [15]:
number_of_baskets = 10
mean_basket_returns = np.zeros(number_of_baskets)
for m in range(1, 14):
    basket_returns = compute_basket_returns(pe_data_filtered, month_forward_returns, number_of_baskets, m)
    mean_basket_returns += basket_returns

mean_basket_returns /= 13  

# Plot the returns of each basket
plt.bar(range(number_of_baskets), mean_basket_returns)
plt.ylabel('Returns')
plt.xlabel('Basket')
plt.legend(['Returns of Each Basket']);
In [16]:
f, axarr = plt.subplots(3, 4)
for month in range(1, 13):
    basket_returns = compute_basket_returns(pe_data_filtered, month_forward_returns, 10, month)

    r = int(np.floor((month-1) / 4))
    c = int((month-1) % 4)
    axarr[r, c].bar(range(10), basket_returns)
    axarr[r, c].xaxis.set_visible(False) # Hide the axis lables so the plots aren't super messy
    axarr[r, c].set_title('Month ' + str(month))

Sometimes Factors are Just Other Factors

Often times a new factor will be discovered that seems to induce spread, but it turns out that it is just a new and potentially more complicated way to compute a well known factor. Consider for instance the case in which you have poured tons of resources into developing a new factor, it looks great, but how do you know it's not just another factor in disguise?

To check for this, there are many analyses that can be done.

Correlation Analysis

One of the most intuitive ways is to check what the correlation of the factors is over time. We'll plot that here.

In [17]:
scores = np.zeros(13)
pvalues = np.zeros(13)
for i in range(13):
    score, pvalue = stats.spearmanr(cap_data_filtered.iloc[i], pe_data_filtered.iloc[i])
    pvalues[i] = pvalue
    scores[i] = score
    
plt.bar(range(1,14),scores)
plt.hlines(np.mean(scores), 1, 14, colors='r', linestyles='dashed')
plt.xlabel('Month')
plt.xlim((1, 14))
plt.legend(['Mean Correlation over All Months', 'Monthly Rank Correlation'])
plt.ylabel('Rank correlation between Market Cap and PE Ratio');

And also the p-values because the correlations may not be that meaningful by themselves.

In [18]:
scores = np.zeros(13)
pvalues = np.zeros(13)
for i in range(13):
    score, pvalue = stats.spearmanr(cap_data_filtered.iloc[i], pe_data_filtered.iloc[i])
    pvalues[i] = pvalue
    scores[i] = score
    
plt.bar(range(1,14),pvalues)
plt.xlabel('Month')
plt.xlim((1, 14))
plt.legend(['Mean Correlation over All Months', 'Monthly Rank Correlation'])
plt.ylabel('Rank correlation between Market Cap and PE Ratio');

There is interesting behavior, and further analysis would be needed to determine whether a relationship existed.

In [19]:
pe_dataframe = pd.DataFrame(pe_data_filtered.iloc[0])
pe_dataframe.columns = ['F1']
cap_dataframe = pd.DataFrame(cap_data_filtered.iloc[0])
cap_dataframe.columns = ['F2']
returns_dataframe = pd.DataFrame(month_forward_returns.iloc[0])
returns_dataframe.columns = ['Returns']

data = pe_dataframe.join(cap_dataframe).join(returns_dataframe)

data = data.rank(method='first')

heat = np.zeros((len(data), len(data)))

for e in data.index:
    F1 = int(data.loc[e]['F1'])
    F2 = int(data.loc[e]['F2'])
    R = data.loc[e]['Returns']
    heat[F1-1, F2-1] += R
    
heat = scipy.signal.decimate(heat, 40)
heat = scipy.signal.decimate(heat.T, 40).T

p = sns.heatmap(heat, xticklabels=[], yticklabels=[])
# p.xaxis.set_ticks([])
# p.yaxis.set_ticks([])
p.xaxis.set_label_text('F1 Rank')
p.yaxis.set_label_text('F2 Rank')
p.set_title('Sum Rank of Returns vs Factor Ranking');

How to Choose Ranking System

The ranking system is the secret sauce of many strategies. Choosing a good ranking system, or factor, is not easy and the subject of much research. We'll discuss a few starting points here.

Clone and Tweak

Choose one that is commonly discussed and see if you can modify it slightly to gain back an edge. Often times factors that are public will have no signal left as they have been completely arbitraged out of the market. However, sometimes they lead you in the right direction of where to go.

Pricing Models

Any model that predicts future returns can be a factor. The future return predicted is now that factor, and can be used to rank your universe. You can take any complicated pricing model and transform it into a ranking.

Price Based Factors (Technical Indicators)

Price based factors take information about the historical price of each equity and use it to generate the factor value. Examples could be 30-day momentum, or volatility measures.

Reversion vs. Momentum

It's important to note that some factors bet that prices, once moving in a direction, will continue to do so. Some factors bet the opposite. Both are valid models on different time horizons and assets, and it's important to investigate whether the underlying behavior is momentum or reversion based.

Fundamental Factors (Value Based)

This is using combinations of fundamental values as we discussed today. Fundamental values contain information that is tied to real world facts about a company, so in many ways can be more robust than prices.

The Arms Race

Ultimately, developing predictive factors is an arms race in which you are trying to stay one step ahead. Factors get arbitraged out of markets and have a lifespan, so it's important that you are constantly doing work to determine how much decay your factors are experiencing, and what new factors might be used to take their place.

This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.