Notebook

The influence of COVID-19 cases on companies according to their geographic revenue

This analysis loads the country-level COVID-19 confirmed cases and deaths sourced by ECDC and Our World in Data. We then load in FactSet's Geographic Revenue Exposure data which tells us how much revenue companies are making from different regions like Europe. From this data we build regional long/short factors over the QTU and see how the returns of these factors line up with COVID-19 infection numbers in the corresponding regions.

This should just serve as an example of the type of analysis we are looking for here, by no means should you feel limited to GeoRev.

A couple of notes on what a good analysis looks like:

  • Explains each data processing step
  • Shows interim results
  • Labels x- and y-axes of plots
  • Has final plot that brings it all together
In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import empyrical as ep
import alphalens as al

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset import GeoRev
from quantopian.pipeline.domain import US_EQUITIES
from quantopian.research import run_pipeline
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.factset import RBICSFocus
In [1]:
# Load COVID-19 data. We periodically update the file in your research
# folder. You can also download an updated file from 
# https://ourworldindata.org/coronavirus-source-data and put 
# it into your data directory in research
covid = local_csv('covid19_cases.csv', date_column='date')
In [3]:
# Map selected countries to their region
region_country = {
    'United States': 'NORTH AMERICA',
    'Canada': 'NORTH AMERICA',
    'Austria': 'EUROPE',
    'Bulgaria': 'EUROPE',
    'Croatia': 'EUROPE',
    'Denmark': 'EUROPE',
    'Estonia': 'EUROPE',
    'Finland': 'EUROPE',
    'France': 'EUROPE',
    'Germany': 'EUROPE',
    'Greece': 'EUROPE',
    'Iceland': 'EUROPE',
    'Hungary': 'EUROPE',
    'Italy': 'EUROPE',
    'Netherlands': 'EUROPE',
    'Poland': 'EUROPE',
    'Portugal': 'EUROPE',    
    'Romania': 'EUROPE',
    'Spain': 'EUROPE',
    'Serbia': 'EUROPE',    
    'Switzerland': 'EUROPE',
    'Sweden': 'EUROPE',
    'United Kingdom': 'EUROPE',
    'Australia': 'ASIA-PACIFIC',
    'Bangladesh': 'ASIA-PACIFIC',
    'Bhutan': 'ASIA-PACIFIC',
    'China': 'ASIA-PACIFIC',
    'Indonesia': 'ASIA-PACIFIC',
    'Japan': 'ASIA-PACIFIC',
    'Malaysia': 'ASIA-PACIFIC',
    'Myanmar': 'ASIA-PACIFIC',
    'New Zealand': 'ASIA-PACIFIC',
    'Singapore': 'ASIA-PACIFIC',
    'South Korea': 'ASIA-PACIFIC',
    'Taiwan': 'ASIA-PACIFIC',
    'Vietnam': 'ASIA-PACIFIC',
}

covid['region'] = covid['location'].map(region_country)

covid_region = covid.reset_index().groupby(['date', 'region']).sum().reset_index(level='region')
In [4]:
covid_region.head()
Out[4]:
region new_cases new_deaths total_cases total_deaths
date
2019-12-31 00:00:00+00:00 ASIA-PACIFIC 27 0 27 0
2019-12-31 00:00:00+00:00 EUROPE 0 0 0 0
2019-12-31 00:00:00+00:00 NORTH AMERICA 0 0 0 0
2020-01-01 00:00:00+00:00 ASIA-PACIFIC 0 0 27 0
2020-01-01 00:00:00+00:00 EUROPE 0 0 0 0
In [5]:
thresh = 25000
covid_region.groupby('region')['total_cases'].plot(legend=True);
plt.axhline(thresh, ls='--', color='0.5', label='Chosen threshold');
plt.ylabel('# confirmed cases'); plt.title('COVID-19 growth in different regions'); plt.legend(loc=0);
In [6]:
# Compute date where threshold was crossed in each region
covid_region_date = covid_region.loc[lambda x: x.total_cases > thresh].reset_index().groupby('region').first()['date']
covid_region_date
Out[6]:
region
ASIA-PACIFIC    2020-02-06 00:00:00+00:00
EUROPE          2020-03-13 00:00:00+00:00
NORTH AMERICA   2020-03-22 00:00:00+00:00
Name: date, dtype: datetime64[ns, UTC]
In [7]:
# Revenue exposure to North America, Asia / Pacific and Europe.
GeoRevNA = GeoRev.slice('NORTH AMERICA')
GeoRevAP = GeoRev.slice('ASIA-PACIFIC')
GeoRevEU = GeoRev.slice('EUROPE')

# Most recent revenue exposure.
rev_exposure_NA = GeoRevNA.est_pct.latest
rev_exposure_AP = GeoRevAP.est_pct.latest
rev_exposure_EU = GeoRevEU.est_pct.latest

# We are not using RBICS sectors for this analysis
# but putting it here in case you want to use it.
sector = RBICSFocus.l1_name.latest

# Add all factors to a pipeline and run it.
pipe = Pipeline(
    columns={
        'rev_exposure_NA': rev_exposure_NA.rank().zscore(),
        'rev_exposure_AP': rev_exposure_AP.rank().zscore(),
        'rev_exposure_EU': rev_exposure_EU.rank().zscore(),
        'sector': sector,
    },
    domain=US_EQUITIES,
    screen=(QTradableStocksUS() & rev_exposure_NA.notnull() & rev_exposure_AP.notnull() & rev_exposure_EU.notnull()),
)

# Run the pipeline for the most recent date available, georev is only updated yearly
df = run_pipeline(pipe, '2018-01-01', '2018-01-01')
df = df.reset_index(level=0, drop=True) # drop date index as it's just a single day
print(df.head())

Pipeline Execution Time: 7.62 Seconds
                   rev_exposure_AP  rev_exposure_EU  rev_exposure_NA  \
Equity(2 [HWM])          -0.431457         0.978334        -0.803243   
Equity(24 [AAPL])         1.142255         0.564753        -1.286297   
Equity(31 [ABAX])        -0.596019         0.466349        -0.510013   
Equity(52 [ABM])         -1.355218        -1.339148        -0.151785   
Equity(53 [ABMD])        -0.969395        -0.111239        -0.238942   

                              sector  
Equity(2 [HWM])          Industrials  
Equity(24 [AAPL])         Technology  
Equity(31 [ABAX])         Healthcare  
Equity(52 [ABM])   Business Services  
Equity(53 [ABMD])         Healthcare  
In [8]:
df = df.drop('sector', axis='columns')
In [9]:
df.head()
Out[9]:
rev_exposure_AP rev_exposure_EU rev_exposure_NA
Equity(2 [HWM]) -0.431457 0.978334 -0.803243
Equity(24 [AAPL]) 1.142255 0.564753 -1.286297
Equity(31 [ABAX]) -0.596019 0.466349 -0.510013
Equity(52 [ABM]) -1.355218 -1.339148 -0.151785
Equity(53 [ABMD]) -0.969395 -0.111239 -0.238942
In [10]:
# Get stock returns
prices = get_pricing(
  symbols=df.index,
  start_date=covid.index[0],
  end_date=covid.index[-1], 
  fields='close_price',
)
In [11]:
# Add datetime index with forward filling to match prices
# This effectively forward-fills the 2018 data
factor = pd.concat({dt: df for dt in prices.index})
factor.head()
Out[11]:
rev_exposure_AP rev_exposure_EU rev_exposure_NA
2019-12-31 00:00:00+00:00 Equity(2 [HWM]) -0.431457 0.978334 -0.803243
Equity(24 [AAPL]) 1.142255 0.564753 -1.286297
Equity(31 [ABAX]) -0.596019 0.466349 -0.510013
Equity(52 [ABM]) -1.355218 -1.339148 -0.151785
Equity(53 [ABMD]) -0.969395 -0.111239 -0.238942
In [12]:
# Use alphalens to compute factor returns for each column
factor_returns = {}
for col in factor.columns:
    factor_data = al.utils.get_clean_factor_and_forward_returns(
        factor[col], prices, periods=[1])
    factor_returns[col] = al.performance.factor_returns(factor_data)['1D']

factor_returns = pd.DataFrame(factor_returns)

factor_returns.head()
Dropped 10.3% entries from factor data: 10.3% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
Dropped 10.3% entries from factor data: 10.3% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
Dropped 10.3% entries from factor data: 10.3% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
Out[12]:
rev_exposure_AP rev_exposure_EU rev_exposure_NA
date
2019-12-31 00:00:00+00:00 0.005583 0.002659 -0.005307
2020-01-02 00:00:00+00:00 -0.002485 -0.003441 0.003255
2020-01-03 00:00:00+00:00 -0.001496 -0.000700 0.001221
2020-01-06 00:00:00+00:00 0.003154 0.001807 -0.003991
2020-01-07 00:00:00+00:00 0.001746 0.002252 0.000951
In [13]:
factor_returns.columns = ['Asia/Pacific factor returns', 'Europe factor returns', 'North America factor returns']

You should summarize your findings ideally in one succinct and largely self-explanatory plot.

In [14]:
fig, axs = plt.subplots(nrows=2, sharex=True);

covid_region.groupby('region')['total_cases'].plot(ax=axs[0], legend=True);
axs[0].axhline(thresh, ls='--', color='0.5', label='Chosen threshold');
axs[0].set(ylabel='# confirmed cases', title='COVID-19 growth in different regions'); axs[0].legend(loc=0);

ep.cum_returns(factor_returns).plot(ax=axs[1])
colors = ['b', 'g', 'r']
for c, (region, date) in zip(colors, covid_region_date.iteritems()):
    axs[1].axvline(date, label=region, ls='--', color=c)
    
axs[1].legend(loc=2); 
axs[1].set(title='GeoRev-weighted factor returns in response to COVID-19 cases crossing {} cases'.format(thresh),
           ylabel='Cumulative returns');

As you can see, there does not seem to be an obvious influence of the confirmed COVID-19 cases onto the returns of companies exposed to the region where the confirmed cases crossed our chosen threshold.