Notebook
In [6]:
import numpy as np
import pandas as pd
import seaborn as sns

from scipy import stats
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import cross_val_score

import random
import matplotlib.pyplot as plt

import datetime
import pytz

import statsmodels.api as sm
#import statsmodels.formula.api as smf

RECREATING THE CNN FEAR AND GREED INDEX

CNN's "Fear and Greed Index" is a popular metric for measuring the general mood of investors. As we saw in Gus's popular community post (https://www.quantopian.com/posts/using-the-cnn-fear-and-greed-index-as-a-trading-signal), use of the Fear and Greed index as a signal in trading algorithms varies. Mean reversion strategies buy stocks when the crowd is afraid. Momentum strategies buy when the crowd is greedy.

Unfortunately, CNN doesn't release the raw data on the sources they use in their Fear and Greed index. They don't even provide CSV if index values, let alone an API, that would allow the index to be used as an algorithm signal.

My hope is to eventually evaluate the value of the Fear and Greed Index as a signal. But, I don't want to rely on the hacky scraping of a chart on CNN.com. I'd like to recreate the Fear and Greed Index.

Using a combination of Quantopian price data and external data sources, I have attempted to recreate the CNN Fear and Greed Index.

FEAR AND GREED FACTORS:

Copied from http://money.cnn.com/investing/about-fear-greed-tool/index.html and http://money.cnn.com/data/fear-and-greed/

  1. Stock Price Breadth: The volume of shares trading in stocks on the rise versus those declining.

  2. Stock Price Strength: The number of stocks hitting 52-week highs and lows on the New York Stock Exchange

  3. Junk Bond Demand: The spread between yields on investment grade bonds and junk bonds

  4. Market Volatility: The VIX (VIX), which measures volatility

  5. Put and Call Options: The put/call ratio, which compares the trading volume of bullish call options relative to the trading volume of bearish put options

  6. Safe Haven Demand: The difference in returns for stocks versus Treasuries

  7. Stock Price Momentum: The S&P 500 (SPX) versus its 125-day moving average

Stock Price Breadth

I will use a simple Advace Decline statistic (# of Advancing Stocks - # of Declining Stocks) as my stock price breadth metric.

I use a current list of Russle 1000 as my universe of stocks. (Note: using the Russle from the present day introduces survivorship bias and a host of other problems. This is definitely not best practice, but in the interest of time, I must trudge on.)

In [7]:
russle = local_csv('russle1000.csv')
In [8]:
russle = russle.rename(columns={'As of 06/27/2014 Russell Indexes.': 'symbol'})
In [9]:
russle.symbol = russle.symbol[1:].apply(lambda x: x.split(' ')[-1])
In [10]:
russle = russle[~russle.symbol.isin(['Ticker', 'Indexes.'])]
russle = russle.dropna()
russle.reset_index(inplace=True, drop=True)
In [11]:
#This is a bit of a hack. You can call get_pricing on a big list of stocks. However, some of stocks
#in this list are listed under different tickers in Quantopian's price data. For time sake, I will use 
#this loop to drop those stocks out and avoid an exception. 
print len(russle)
russle_price = pd.DataFrame()
for sym in russle.symbol: 
    try:
        price = get_pricing(sym, fields='price', frequency='daily', start_date='2008-01-01', end_date='2015-03-31')
        russle_price[sym] = price
    except:
        print sym
    
1026
data files aren't distributed with source.
Fetching data from Yahoo Finance.
data files aren't distributed with source.
Fetching data from data.treasury.gov
ATK
ASBC
BRK.B
BF.B
CBSO
CMCSA
CWH
DISCA
FCE.A
GOOGL
HUB.B
LINTA
LVNTA
NU
ONNN
STRZA
WAG
JW.A
In [12]:
russle_delta = russle_price.pct_change()
russle_breadth = russle_delta.apply(lambda x: x.apply(lambda y: 1 if y > 0 else -1))
total_breadth = russle_breadth.sum(axis=1)
In [294]:
total_breadth.plot()
plt.title('# of Advancing Stocks - # of Declining Stocks in the Russle 1000')
Out[294]:
<matplotlib.text.Text at 0x7f731b404d90>
In [295]:
#I use an exponential decay moving average to smooth the data
month_rolling_breadth = pd.stats.moments.ewma(total_breadth, 10, min_periods=10)
In [297]:
month_rolling_breadth.plot()
plt.title('# of Advancing Stocks - # of Declining Stocks in the Russle 1000 (10 Day EWMA)')
Out[297]:
<matplotlib.text.Text at 0x7f731b79f850>

Stock Price Strength

I measure stock price strength by taking the difference in the number of 52 week highs and 52 week lows for a given day. Rather than pull in price data from every stock in the NYSE, I scraped the 52 week high/low counts from from WSJ. (http://online.wsj.com/mdc/public/page/2_3021-newhinyse-newhighs-20150319.html?mod=mdc_pastcalendar)

While the Quantopian Research Environment doesn't currently support the request or beautiful soup libraries, you can perform this kind of scraping locally. I've included my code below in case you want to give it a go yourself.

In [442]:
#We are scraping this site:
# http://online.wsj.com/mdc/public/page/2_3021-newhinyse-newhighs-20150319.html?mod=mdc_pastcalendar
# I use the Put/Call ratio csv used later in this notebook to get the dates for our scrape.

# datesold = pd.read_csv('/Users/acampbell/Downloads/equitypc.csv', skiprows=[0,1])
# datesold = datesold['DATE']

# dates = datesold.apply(lambda x: pd.to_datetime(x))
# dates = dates.apply(lambda x: x.strftime("%Y%m%d"))

# highs = []
# lows = []
# newdates = []
# for date in dates[797:]:
#     url = 'http://online.wsj.com/mdc/public/page/2_3021-newhinyse-newhighs-' + date + '.html?mod=mdc_pastcalendar'
#     x = urllib2.urlopen(url) # Opens URLS
#     htmlSource = x.read()
#     x.close()
#     soup = bs4.BeautifulSoup(htmlSource)
#     table = soup.find_all('table')[2]
#     rows = table.find_all(attrs={'colspan':'6'})
#     high = int(rows[0].string.split(' ')[3])
#     low = int(rows[1].string.split(' ')[3])
#     print high
#     print low
    
#     highs.append(high)
#     lows.append(low)
#     newdates.append(date)
    
# df = pd.DataFrame({'dates': newdates, 'highs': highs, 'lows': lows})

# df['date'] = datesold[797:].reset_index().DATE
# df[['date', 'highs', 'lows']].to_csv('52_day_highs_lows.csv')
In [18]:
highlows = local_csv('52_day_highs_lows.csv', date_column='date')
In [19]:
highlows = highlows.highs - highlows.lows
In [298]:
highlows.plot()
plt.title('Number 52 Week Highs Minus 52 Week Lows on the NYSE')
Out[298]:
<matplotlib.text.Text at 0x7f731b3643d0>

Junk Bond Demand

Here, I am using the Bank of America High Yield index and AA bond index to calculate a spread. I couldn't find a long time span, reliable investment grade index time series, but I believe the AA index will serve as a good proxy. I pulled data from the Fedearal Reserve of St. Louis's lovely FRED website.

https://research.stlouisfed.org/fred2/series/BAMLC0A2CAAEY

https://research.stlouisfed.org/fred2/series/BAMLH0A0HYM2EY

Note: Based on the two main credit rating agencies, high-yield bonds carry a rating below 'BBB' from S&P, and below 'Baa' from Moody's.

In [21]:
hy = local_csv('high_yield_BofA.csv', date_column='observation_date', skiprows=10)
ig = local_csv('AA_yield_BofA.csv',  skiprows=10, date_column='observation_date')
In [22]:
yield_spread = pd.DataFrame({'igyield': ig.BAMLC0A2CAAEY, 'highyield': hy.BAMLH0A0HYM2EY}) 
In [23]:
print yield_spread.highyield.isnull().sum()
print yield_spread.igyield.isnull().sum()
0
0
In [24]:
yield_spread['spread'] = yield_spread.highyield - yield_spread.igyield
In [300]:
yield_spread[['highyield', 'igyield']].plot()
plt.title('High Yield and Investment Grade Bond Yields')
plt.ylabel('%')
Out[300]:
<matplotlib.text.Text at 0x7f731b0e8650>
In [301]:
yield_spread.spread.plot()
plt.title('High Yeild - Investment Grade Spread')
plt.ylabel('Yield Spread %')
Out[301]:
<matplotlib.text.Text at 0x7f731b026810>

Market Volatility

I use the VIX as my measure of market volatility. I pulled data from Yahoo finance.

Available here: http://finance.yahoo.com/echarts?s=%5EVIX+Interactive#

In [29]:
vix = local_csv('YAHOO-INDEX_VIX .csv', date_column='Date')
In [30]:
vix = vix['Adjusted Close']
In [31]:
vix.plot()
plt.title('VIX Index')
Out[31]:
<matplotlib.text.Text at 0x7f7328b9a550>

Put/Call Options

I pulled put/call ratio data from the Chicago Board of Options Exchange (CBOE).

If you head over to their site, you will see that there are a couple different options ratios to chose from. The index p/c ratio measures index options, which are used by large money managers to hedge portfolios of stocks. Thus the total (equity and index options) put/call ratio can distort the measurement of the behavior of the speculative crowd. A better gauge is the CBOE's equity-only put/call ratio.

http://www.cboe.com/data/putcallratio.aspx

In [32]:
putcall = local_csv('equitypc.csv', skiprows=2, date_column='DATE')
In [33]:
putcall = putcall['P/C Ratio']
In [34]:
putcall = pd.stats.moments.ewma(putcall, 5, min_periods=5)
In [35]:
putcall.plot()
plt.title('Equity-only Put/Call Ratio')
Out[35]:
<matplotlib.text.Text at 0x7f73289dc350>

Safe Haven Demand

Demand for safe assets is measured by the 30 day stock market return the 30 day return from treasuries.

For convienence sake, I use the treasury ETF TLT as a proxy for the value of treasuries and SPY as the value of the market.

In [36]:
spy = get_pricing('SPY', frequency='daily', fields='price', start_date='2009-01-01', end_date='2015-03-10')
treas = get_pricing('TLT', frequency='daily', fields='price', start_date='2009-01-01', end_date='2015-03-10')
In [37]:
returns = pd.DataFrame() 
returns['spyret'] = pd.Series(data=((spy[30:].values - spy[:-30].values)/spy[:-30].values), 
                              index=spy[30:].index)
returns['treasret'] = pd.Series(data=((treas[30:].values - treas[:-30].values)/treas[:-30].values),
                                index=treas[30:].index)
spy_treasury_spread = returns['spyret'] - returns['treasret']
In [38]:
spy_treasury_spread.head()
Out[38]:
2009-02-17 00:00:00+00:00   -0.054295
2009-02-18 00:00:00+00:00   -0.072156
2009-02-19 00:00:00+00:00   -0.078064
2009-02-20 00:00:00+00:00   -0.069850
2009-02-23 00:00:00+00:00   -0.112303
dtype: float64
In [303]:
returns.plot()
plt.title('30 Day Return from SPY vs. TLT')
plt.ylabel('Return')
Out[303]:
<matplotlib.text.Text at 0x7f731aeaaf10>
In [304]:
spy_treasury_spread.plot()
plt.title('SPY minus Treasury Bond 30 Day Return Spread')
Out[304]:
<matplotlib.text.Text at 0x7f731ae09b10>

Stock Market Momentum

The S&P 500 price divided by its 125 day mean

In [41]:
spy_roll = pd.stats.moments.rolling_mean(spy, 125, min_periods=125)
In [42]:
spy_momentum = (spy/spy_roll).dropna()
In [306]:
pd.DataFrame({'Spy 125 Day Rolling': spy_roll, 'Spy Spot': spy}).plot()
plt.title('SPY')
plt.ylabel('Price')
Out[306]:
<matplotlib.text.Text at 0x7f731abc2990>
In [43]:
spy_momentum.plot()
plt.title('SPY Spot/ SPY 125 Day Mean')
Out[43]:
<matplotlib.text.Text at 0x7f7328ff4090>

Combining Our Metrics

At last, the time has come to combine our metrics and generate a Fear and Greed Index of our own.

In [310]:
# Need to give these series a timezone so they can be combined with other TZ aware series 
yield_spread = yield_spread.tz_localize('UTC')
highlows = highlows.tz_localize('UTC')
putcall = putcall.tz_localize('UTC')
vix = vix.tz_localize('UTC')
In [311]:
feargreedraw = pd.DataFrame({'stock_momentum': spy_momentum, 'stock_strength': highlows, 'stock_breadth': total_breadth,
                          'putcall_ratio': putcall, 'junk_bond_demand': yield_spread.spread, 'market_volatility': vix,
                          'safe_haven_demand': spy_treasury_spread})
In [312]:
feargreedraw.isnull().sum()
Out[312]:
junk_bond_demand     1264
market_volatility      87
putcall_ratio        1309
safe_haven_demand    1888
stock_breadth        1590
stock_momentum       1982
stock_strength       2102
dtype: int64
In [313]:
feargreed = feargreedraw.dropna()

We need to flip the sign of our VIX and yield spread data so that an increase in each will lower our Fear and Greed Index (an increase in the Fear and Greed index indicates greater investor confidence)

In [314]:
feargreed.market_volatility = feargreed.market_volatility * -1
feargreed.junk_bond_demand = feargreed.junk_bond_demand * -1 
In [315]:
print len(feargreed)
feargreed.head()
1302
Out[315]:
junk_bond_demand market_volatility putcall_ratio safe_haven_demand stock_breadth stock_momentum stock_strength
2010-01-04 00:00:00+00:00 -5.14 -20.04 0.567646 0.072161 460 1.079785 321
2010-01-05 00:00:00+00:00 -5.13 -19.35 0.564705 0.084267 -72 1.080629 401
2010-01-06 00:00:00+00:00 -5.00 -19.16 0.553921 0.100282 -20 1.079580 450
2010-01-07 00:00:00+00:00 -4.87 -19.06 0.541601 0.089670 -14 1.081789 339
2010-01-08 00:00:00+00:00 -4.88 -18.13 0.528001 0.098631 -8 1.083286 378
In [316]:
#I am unsure of how to best scale my factors so they are equally weighted and produce a final score
#that is comparable to the CNN index's range. 

#This is a hacky first attempt
feargreed_scaled = pd.DataFrame()
for col in feargreed.columns:
    feargreed_scaled[col] = (feargreed[col]/feargreed[col][feargreed.index[0]])*12
In [317]:
#Here I take the sum of the parts to get my total index value.
feargreed_scaled['mine'] = feargreed_scaled.sum(axis=1)
In [318]:
feargreed_scaled.head()
Out[318]:
junk_bond_demand market_volatility putcall_ratio safe_haven_demand stock_breadth stock_momentum stock_strength mine
2010-01-04 00:00:00+00:00 12.000000 12.000000 12.000000 12.000000 12.000000 12.000000 12.000000 84.000000
2010-01-05 00:00:00+00:00 11.976654 11.586826 11.937826 14.013098 -1.878261 12.009373 14.990654 74.636171
2010-01-06 00:00:00+00:00 11.673152 11.473054 11.709849 16.676355 -0.521739 11.997714 16.822430 79.830814
2010-01-07 00:00:00+00:00 11.369650 11.413174 11.449402 14.911601 -0.365217 12.022264 12.672897 73.473769
2010-01-08 00:00:00+00:00 11.392996 10.856287 11.161896 16.401844 -0.208696 12.038908 14.130841 75.774077
In [426]:
feargreed_scaled[feargreed_scaled.columns[:7]].plot(linewidth=.8)
plt.title('My Fear and Greed Index Components with an a Lame Attempt at Equal Weighting')
Out[426]:
<matplotlib.text.Text at 0x7f7319634b50>

How does our index compare to CNN's index?

I pulled the CSV that Gus scraped from CNN's Fear and Greed chart:

https://gist.githubusercontent.com/gusgordon/7615f5b91f3cba1e7ff5/raw/261a3213b20f7cc7d2ee52be2cdc81c49f69a4de/gistfile1.txt

I plot my metric and CNN's to get a sense for how closely I mimicked their process.

In [320]:
FG_index = local_csv('Fear_and_Greed_index.csv', date_column='date')
In [321]:
FG_index = FG_index.value
In [322]:
feargreed_scaled['CNN'] = FG_index
In [427]:
feargreed_scaled[['CNN', 'mine']].dropna().plot(linewidth=1.2)
plt.title('CNN Fear and Greed Index vs. My Homebrew Fear and Greed Index')
Out[427]:
<matplotlib.text.Text at 0x7f731920b810>

My metric seems to have a lot of noise. Let's try 4 days of EWMA for smoothing.

In [325]:
feargreed_scaled['mine_smooth'] = pd.stats.moments.ewma(feargreed_scaled.mine, 4, min_periods=4)
In [423]:
feargreed_scaled[['CNN', 'mine_smooth']].dropna().plot(linewidth=1.5)
plt.title('CNN Fear and Greed Index vs. My Homebrew Fear and Greed Index (4 Day EWMA)')
Out[423]:
<matplotlib.text.Text at 0x7f7319218350>

Interestingly, the CNN index appears to track the spread between treasuries and stocks pretty closely. This "safe haven demand" metric may be the most useful of the bunch.

In [328]:
feargreed_scaled[['CNN', 'safe_haven_demand']].dropna().plot(alpha=.8)
plt.title('CNN Fear and Greed Index vs. Spread Between Stock and Treasury Returns')
Out[328]:
<matplotlib.text.Text at 0x7f731a6f51d0>

Much of the Fear and Greed index commentary points to the importance of where a particular metric sits relative to where it has been in recent history. To capture this relativity, I want to take the Z score of each observation. To avoid look ahead bias, we need to compute a rolling Z score.

In [329]:
#Partially adapted from http://vincent.is/finding-trending-things/
def rolling_zscore(data, decay=0.9):
    #Lower decay = more importance of recent points in mean calculation
    avg = float(data[0])
    squared_average = float(data[0] ** 2)

    def add_to_history(point, average, sq_average):
        average = average * decay + point * (1 - decay)
        sq_average = sq_average * decay + (point ** 2) * (1 - decay)
        return average, sq_average

    def calculate_zscore(average, sq_average, value):
        std = round(np.sqrt(sq_average - avg ** 2))
        if std == 0:
            return value - average

        return (value - average) / std
    
    zscores = []
    for point in data[1:]:
        zscores.append(calculate_zscore(avg, squared_average, point))
        avg, squared_average = add_to_history(point, avg, squared_average)

    return zscores
In [443]:
feargreed_rolling_z = pd.DataFrame()
for col in feargreed.columns[:7]:
    feargreed_rolling_z[col] = rolling_zscore(feargreed[col], decay=.95)
    
feargreed_rolling_z = feargreed_rolling_z.set_index(feargreed.index[1:])

feargreed_rolling_z['mine'] = feargreed_rolling_z.sum(axis=1)
feargreed_rolling_z['CNN'] = FG_index
feargreed_rolling_z['mine'] = feargreed_rolling_z.mine*12 + 50
In [444]:
feargreed_rolling_z[feargreed_rolling_z.columns[:7]].plot(linewidth=.8)
plt.ylim(-4,4)
Out[444]:
(-4, 4)
In [445]:
feargreed_rolling_z[['CNN', 'mine']].dropna().plot()
plt.title('CNN Fear and Greed Index vs. My Homebrewed Fear and Greed Index')
Out[445]:
<matplotlib.text.Text at 0x7f72fb6b3290>
In [446]:
feargreed_rolling_z['mine_smooth'] = pd.stats.moments.ewma(feargreed_rolling_z.mine, 4, min_periods=4)
In [447]:
feargreed_rolling_z[['CNN', 'mine_smooth']].dropna().plot()
plt.title('CNN Fear and Greed Index vs. My Homebrew Fear and Greed Index (4 day EWMA of rolling Z-score)')
plt.ylabel('Index Value')
Out[447]:
<matplotlib.text.Text at 0x7f72fb711a50>

Comments are more than welcome. I am still unsure of my method of equally weighting metrics. But, it is interesting that I was largely able to repliate CNN's work!

Preliminary assesment of the Fear and Greed Index as a trading signal

In [439]:
feargreed_rolling_z['spy'] = spy.pct_change()
feargreed_rolling_z = feargreed_rolling_z.dropna()
In [440]:
predict = pd.DataFrame({'fear_greed': feargreed_rolling_z.mine_smooth.values[:-1],
                       'pricedelta': feargreed_rolling_z.spy.values[1:]})
predict.head()
Out[440]:
fear_greed pricedelta
0 63.909328 -0.027889
1 57.487470 0.006188
2 60.197901 0.010716
3 65.917676 -0.012630
4 63.125362 -0.000538
In [441]:
plt.scatter(predict.fear_greed, predict.pricedelta)
z = np.polyfit(predict.fear_greed, predict.pricedelta, 1)
p = np.poly1d(z)
plt.plot(predict.fear_greed, p(predict.fear_greed), 'r-')
plt.ylabel('SPY Percent Price Change')
plt.xlabel('Previous Day Fear and Greed Index')
Out[441]:
<matplotlib.text.Text at 0x7f72fbb04b90>

This negative slope suggests the Fear and Greed index is a good signal for mean reversion algos. When the crowd is greedy (high FG value) prices have a tendancy to fall.

I have tried some basic regression models and machine learning classifiers to assess the predictive power of my index. Nothing worth sharing yet... I'll be sure to update in this thread!

In [ ]: