1. Look for an idea
- academia / intuition
2. Replicate or create results
3. plausability check
- Holding period / signal decay
- costs / slippage
- feasability at capital level
- capacity
- accept or go to 1.
4. Design an algorithm & Backtest
- Repeat step 3
5. Make it your own
- Apply your bread & butter
- Repeat step 3
6. Paper trade
- length of paper trading depends on turnover
- Repeat step 3
7. Live trade with low stakes
- Repeat step 3
- Compare live results with backtested estimations
- slippage / transactions
- Non-fills; are they an issue
8. Add to the portfolio and allocate capital
9. Go to 1.
For this demo I looked to an academic paper from MIT
"Developing high-frequency equities trading models," by Leandro Rafael Infantino & Savion Itzhaki
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import itertools
import statsmodels as sm
from sklearn.decomposition import PCA
def pplot(df, **plot_kwargs):
try:
return pd.DataFrame(df.values, columns=df.columns).plot(**plot_kwargs)
except:
return pd.Series(df.values).plot(**plot_kwargs)
Rather than exactly replicating the results from the paper I will use a basket of liquid ETFs that span the market.
FUNDS = symbols([
'SPY', # SPDR S&P 500, 'Large Cap Blend Equities'
'MDY', # SPDR MidCap Trust Series I, 'Mid Cap Blend Equities'
'EWI', # iShares MSCI Italy Capped ETF, 'Europe Equities'
'EWS', # iShares MSCI Singapore ETF, 'Asia Pacific Equities'
'EWO', # iShares MSCI Austria Capped ETF, 'Europe Equities'
'EWD', # iShares MSCI Sweden ETF, 'Europe Equities'
'EWL', # iShares MSCI Switzerland Capped ETF, 'Europe Equities'
'EWQ', # iShares MSCI France ETF, 'Europe Equities'
'EWP', # iShares MSCI Spain Capped ETF, 'Europe Equities'
'EWG', # iShares MSCI Germany ETF, 'Europe Equities'
'EWM', # iShares MSCI Malaysia ETF, 'Emerging Markets Equities'
'EWK', # iShares MSCI Belgium Capped ETF, 'Europe Equities'
'EWW', # iShares MSCI Mexico Capped ETF, 'Latin America Equities'
'EWN', # iShares MSCI Netherlands ETF, 'Europe Equities'
'EWC', # iShares MSCI Canada ETF, 'Foreign Large Cap Equities'
'EWH', # iShares MSCI Hong Kong ETF, 'China Equities'
'EWJ', # iShares MSCI Japan ETF, 'Japan Equities'
'EWU', # iShares MSCI United Kingdom ETF, 'Europe Equities'
'EWA', # MSCI Australia ETF, 'Asia Pacific Equities'
'DIA', # Dow Jones Industrial Average ETF, 'Large Cap Value Equities'
'XLP', # Consumer Staples Select Sector SPDR, 'Consumer Staples Equities'
'XLY', # Consumer Discretionary Select Sector SPDR,'Consumer Discretionary Equities'
'XLB', # Materials Select Sector SPDR, 'Materials'
'XLU', # Utilities Select Sector SPDR, 'Utilities Equities'
'XLF', # Financial Select Sector SPDR, 'Financials Equities'
'XLE', # Energy Select Sector SPDR, 'Energy Equities'
'XLK', # Technology Select Sector SPDR, 'Technology Equities'
'XLV', # Health Care Select Sector SPDR, 'Health & Biotech Equities'
'XLI', # Industrial Select Sector SPDR, 'Industrials Equities'
'QQQ', # QQQ, 'Large Cap Growth Equities'
'EWY', # iShares MSCI South Korea Capped ETF, 'Asia Pacific Equities'
'IVV', # Core S&P 500 ETF, 'Large Cap Blend Equities'
'IYW', # iShares U.S. Technology ETF, 'Technology Equities'
'IWB', # iShares Russell 1000 ETF, 'Large Cap Blend Equities'
'IWD', # iShares Russell 1000 Value ETF, 'Large Cap Value Equities'
'IVW', # iShares S&P 500 Growth ETF, 'Large Cap Growth Equities'
'IYF', # iShares U.S. Financials ETF, 'Financials Equities'
'IWV', # iShares Russell 3000 ETF, 'All Cap Equities'
'IYZ', # iShares U.S. Telecommunications ETF, 'Communications Equities'
'IWF', # iShares Russell 1000 Growth ETF, 'Large Cap Growth Equities'
'IWM', # iShares Russell 2000 ETF, 'Small Cap Blend Equities'
'IVE', # iShares S&P 500 Value ETF, 'Large Cap Value Equities'
'IJH', # Core S&P Mid-Cap ETF, 'Mid Cap Blend Equities'
'IJR', # Core S&P Small-Cap ETF, 'Small Cap Blend Equities'
'IDU', # iShares U.S. Utilities ETF, 'Utilities Equities'
'IYE', # iShares U.S. Energy ETF, 'Energy Equities'
'IYC', # iShares U.S. Consumer Services ETF, 'Consumer Discretionary Equities'
'IYY', # iShares Dow Jones U.S. ETF, 'All Cap Equities'
'IYM', # iShares U.S. Basic Materials ETF, 'Materials'
'IYG', # iShares U.S. Financial Services ETF, 'Financials Equities'
'IYK', # Dow Jones U.S. Consumer Goods Index Fund, 'Consumer Staples Equities'
'IYH', # iShares U.S. Healthcare ETF, 'Health & Biotech Equities'
'IYJ', # iShares U.S. Industrials ETF, 'Industrials Equities'
'EWT', # iShares MSCI Taiwan ETF, 'Asia Pacific Equities'
'EWZ', # iShares MSCI Brazil Capped ETF, 'Latin America Equities'
'IWN', # iShares Russell 2000 Value ETF, 'Small Cap Value Equities'
'IJK', # iShares U.S. Consumer Goods ETF, 'Mid Cap Growth Equities'
'IJS', # iShares S&P Small-Cap 600 Value ETF, 'Small Cap Value Equities'
'IJT', # iShares S&P Small-Cap 600 Growth ETF, 'Small Cap Growth Equities'
'IJJ', # iShares S&P Mid-Cap 400 Value ETF, 'Mid Cap Value Equities'
'IWO', # iShares Russell 2000 Growth ETF, 'Small Cap Growth Equities'
'IBB', # Nasdaq Biotechnology, 'Health & Biotech Equities'
'SHY', # 1-3 Year Treasury Bond ETF 'Government Bonds'
'IEF', # iShares 7-10 Year Treasury Bond ETF, 'Government Bonds'
'TLT', # 20+ Year Treasury Bond ETF, 'Government Bonds'
'LQD', # iShares iBoxx Investment Grade 'Corporate Bonds'
'IYR', # iShares U.S. Real Estate ETF 'Real Estate'
'ICF', # iShares Cohen & Steers REIT ETF 'Real Estate'
'RWR', # SPDR DJ Wilshire REIT ETF 'Real Estate'
])
prices = get_pricing(FUNDS,
start_date='2010-12-10',
end_date='2011-01-04',
frequency='minute',
fields='price',
handle_missing='ignore').ffill()
prices = prices.dropna()
log_prices = np.log(prices)
log_rets = log_prices.diff().dropna()
fig = pplot(
log_prices / log_prices.iloc[0],
title='Normalized Log Asset Prices',
legend=False,
)
fig.set_xlabel("Timesteps")
def get_PCA_spreads(prices, n_components=10, compound_periods=1):
"""
Returns the spread between the log returns
and the model estimate.
:Params:
prices: pandas.DataFrame
pricing data used to fit the model.
n_components: integer
number of principal components used
compound_periods: integer
log returns are summed over this many periods.
Uses the fact that log returns are time additive.
e.g. compound_periods=5 would correspond to using
5 minute compounded returns.
:Returns:
pandas.DataFrame
actual returns - model estimations
"""
pca = PCA(n_components=n_components)
X = np.log(prices).diff().iloc[1:]
if compound_periods > 1:
X = pd.rolling_sum(X, compound_periods).iloc[compound_periods:]
M = X.mean()
Y = X - M
model = pca.fit(Y)
psi = model.components_.T
D = pd.DataFrame(np.dot(psi.T, Y.T).T, index=Y.index) # same as model.transform(Y)
regressions = {}
for asset in Y:
y = Y[asset]
x = D
regressions[asset] = pd.ols(y=y, x=x)
B = pd.DataFrame({s: regressions[s].beta
for s in regressions}).drop('intercept')
S = D.dot(B)
S_est = (S + M).dropna()
spreads = (X - S_est).dropna()
return spreads
n_components = 5
compound_periods = 0
spreads = get_PCA_spreads(prices,
n_components=n_components,
compound_periods=compound_periods)
fig = pplot(
spreads.cumsum(),
legend=False,
title="Cumulative Spread Return: (Return - model estimate)"
)
fig.set_xlabel('Timesteps (minutes)')
fig.set_ylabel('Cumulative returns')
spreads.T.sum().hist(bins=150)
print sm.stats.stattools.jarque_bera(spreads.T.sum())
# Not even close to normally distributed, p-value = 0.0
$-sign(S_{t-1}) * R_t$
Where $S_t$ is the spread at time $t$ and $R_t$ is the returns at time $t$
def make_sign_frame(signals, stepsize=1, idx0=0):
"""
Returns a dataframe with the shape as signals
where sign of each element has been copied.
:Params:
signals: Series/DataFrame
signals centered around 0
stepsize: integer
Using a stepsize=N copies the sign of every Nth
time step, remaining values are forward filled.
idx0: int
the starting index used, previous elements
will be NaNs.
"""
signs = np.nan * signals
signs.iloc[idx0::stepsize] = np.copysign(1, signals.iloc[idx0::stepsize])
signs = signs.ffill()
return signs
directions = -make_sign_frame(spreads)
predictions = (log_rets * directions.shift(1))
fig = pplot(
predictions.cumsum(),
legend=False,
title="Asset Returns Using One Step Ahead Prediction"
)
fig.set_xlabel('Timesteps (minutes)')
fig.set_ylabel('Cumulative Asset Return')
fig = pplot(
predictions.T.sum().cumsum(),
legend=False,
title="Strategy Returns Using One Step Ahead Prediction"
)
fig.set_xlabel('Timesteps (minutes)')
fig.set_ylabel('Cumulative Strategy Return')
rolling_spreads = {}
for i in xrange(500, len(prices)):
_prices = prices.iloc[i-500: i]
rolling_spreads[prices.index[i]] = get_PCA_spreads(_prices).iloc[-1]
rolling_spreads = pd.DataFrame(rolling_spreads).T
directions = -make_sign_frame(rolling_spreads)
predictions = (log_rets * directions)
fig = pplot(
predictions.cumsum(),
legend=False,
title="Asset returns (immediate execution)"
)
fig.set_xlabel('Timesteps (minutes)')
fig.set_ylabel('Cumulative Asset Return')
directions = -make_sign_frame(rolling_spreads)
predictions = (log_rets * directions).T.sum()
fig = pplot(
predictions.cumsum(),
title="Strategy returns (immediate execution)"
)
def get_turnover_from_weights(weights):
"""
Returns the daily strategy turnover given
a DataFrame of portfolio target weights.
"""
return weights.diff().abs().T.sum() / 2
def get_proportional_signal_weights(signals, leverage=1.0):
"""
returns a set of portfolio weights proportional
to the signal values.
"""
return leverage * (signals.T / signals.T.abs().sum()).T
results = {}
turnover = {}
for i in range(1, 21):
directions = -make_sign_frame(rolling_spreads, i)
results[i] = (log_rets * directions).T.sum()
weights = get_proportional_signal_weights(directions)
turnover[i] = get_turnover_from_weights(weights)
results = pd.DataFrame(results)
turnover = pd.DataFrame(turnover)
fig = pplot(
results.cumsum(),
title="Returns Vs. Holding Period"
)
fig.set_xlabel('Timesteps (minutes)')
fig.set_ylabel('Cumulative Returns')
What happens as we wait longer to place orders?
results_delayed = {}
for i in range(0, 21):
directions = -make_sign_frame(rolling_spreads, 1)
results_delayed[i] = (log_rets * directions.shift(i)).T.sum()
weights = get_proportional_signal_weights(directions)
results_delayed = pd.DataFrame(results_delayed)
fig = pplot(
results_delayed.cumsum(),
title="Returns Vs. Execution Delay"
)
fig.set_xlabel('Timesteps (minutes)')
fig.set_ylabel('Cumulative Returns')
def annual_return(returns, style='compound'):
"""Determines the annual returns of a strategy.
Parameters
----------
returns : pd.Series
Daily returns of the strategy, noncumulative.
- See full explanation in tears.create_full_tear_sheet.
style : str, optional
- If 'compound', then return will be calculated in geometric
terms: (1+mean(all_daily_returns))^252 - 1.
- If 'calendar', then return will be calculated as
((last_value - start_value)/start_value)/num_of_years.
- Otherwise, return is simply mean(all_daily_returns)*252.
Returns
-------
float
Annual returns.
"""
if returns.size < 1:
return np.nan
if style == 'calendar':
num_years = len(returns) / 252.0
df_cum_rets = cum_returns(returns, starting_value=100)
start_value = df_cum_rets[0]
end_value = df_cum_rets[-1]
return ((end_value - start_value) / start_value) / num_years
if style == 'compound':
return pow((1 + returns.mean()), 252) - 1
else:
return returns.mean() * 252
def annual_volatility(returns):
"""
Determines the annual volatility of a strategy.
Parameters
----------
returns : pd.Series
Daily returns of the strategy, noncumulative.
- See full explanation in tears.create_full_tear_sheet.
Returns
-------
float
Annual volatility.
"""
if returns.size < 2:
return np.nan
return returns.std() * np.sqrt(252)
def sharpe_ratio(returns, returns_style='compound'):
"""
Determines the Sharpe ratio of a strategy.
Parameters
----------
returns : pd.Series
Daily returns of the strategy, noncumulative.
- See full explanation in tears.create_full_tear_sheet.
returns_style : str, optional
See annual_returns' style
Returns
-------
float
Sharpe ratio.
Note
-----
See https://en.wikipedia.org/wiki/Sharpe_ratio for more details.
"""
numer = annual_return(returns, style=returns_style)
denom = annual_volatility(returns)
if denom > 0.0:
return numer / denom
else:
return np.nan
Minute bar data is used here so this is not a true Sharpe ratio, but the resulting figures are still very informative.
sharpes = results.apply(sharpe_ratio)
fig = sharpes.plot(title="Sharpe Ratio Vs. Holding Period (No Slippage)")
fig.set_xlabel('Holding period (minutes)')
fig.set_ylabel('Sharpe Ratio')
def adjust_for_slippage(returns, turnover, basis_points=20):
"""
Adjusts a return series for slippage based on the
portfolio turnover rate and an assumed number
of basis points of slippage.
Parameters
----------
returns : pd.Series
backtest daily returns series
turnover : pd.Series or float
strategy turnover rate as a series or value.
basis_points : int/float
slippage assumption of average market impact.
Returns
-------
Series
slippage adjusted returns
"""
slippage = 0.0001 * basis_points
return returns - slippage * turnover
slippages = np.linspace(0, 60)
adjusted_sharpes = pd.DataFrame({
s: adjust_for_slippage(results, turnover, s).apply(sharpe_ratio)
for s in slippages
})
fig = adjusted_sharpes.T.plot(title="Sharpe Ratios with different Slippage Assumptions")
fig.set_xlabel('Basis Points of Slippage')
fig.set_ylabel('Sharpe Ratio')
backtest = get_backtest("55c20ef679265f0c6713ddd8")
def get_backtest_turnover(transactions, portfolio_values):
"""
Returns the daily portfolio turnover defined
as the average value of buys & sells over the
ending portfolio value.
Parameters
----------
transactions : pd.DataFrame
The BacktestResult.transactions attribute
portfolio_values : pd.Series
end of day portfolio values for the backtest
Returns
-------
Series
daily portfolio turnover
"""
prices = transactions.price
amounts = transactions.amount
sizes = prices * amounts.abs()
daily_value = sizes.groupby(sizes.index).sum()
# Divide by 2 because it's the avg of buys and sells
turnover = daily_value / portfolio_values / 2.
return turnover
turnover = get_backtest_turnover(
backtest.transactions,
backtest.daily_performance.ending_portfolio_value
)
returns = backtest.daily_performance.returns
turnover.plot()
print "Average Daily Turnover is {} times the portfolio value!".format(round(turnover.mean()))
with_slippage = pd.DataFrame({
s: adjust_for_slippage(returns, turnover, s)
for s in slippages
})
fig = with_slippage.apply(sharpe_ratio).plot(title='Backtest Sharpe with Slippage')
fig.set_xlabel('Basis Points of Slippage')
fig.set_ylabel('Sharpe Ratio')
Commission costs can be lumped in with slippage, but it is still useful to see how much you can pay before the costs outweigh your PNL.
def get_transaction_commission(txns, per_share=0.005):
amounts = txns.amount.abs().groupby(txns.amount.index).sum()
return amounts * per_share
commissions = pd.DataFrame({
i: get_transaction_commission(backtest.transactions, i)
for i in np.linspace(0, 0.001)
})
adj_pnl = backtest.daily_performance.pnl - commissions
fig = adj_pnl.mean().plot(title="Average Daily PNL Vs. Commision/Share Paid")
fig.set_xlabel('Commission Per Share ($)')
fig.set_ylabel('Average Daily PNL')
As you can see HFT algorithms are not feasible for retail traders who have to pay significant commissions to their brokerage.
Quantopian's research environment opens up the possibilities for analyzing trading strategies. We can easily tinker and visualize data as well as do sensitivity analyses to determine their feasibility.
I hope you've found this informative and it helps you intelligently analyze your own strategies.
David Edwards