The current proposed 'slippage model' is likely highly flawed and will lead to fairly high inaccuracies on thinly traded stocks with high volatility and in certain market regimes.
Below is a thought experiment showing why:
STOCK A:
Average trailing 5,000 minute volume. Has traded an average of $100,000 every minute for the past 5,000 minutes.
Volatility of the trading range. Has traded +-5% of $100,000 in every minute over those 5,000 minutes.
Trading range over lookback period. Has traded within +-.2% of $10 for every tick in those 5,000 minutes.
Current trading volume. Is trading $95,000 in the current minute.
Broad Market Vix at 15 and steady for months.
STOCK B:
Average trailing 5,000 minute volume. Has traded an average of $1,000 every minute for the past 5,000 minutes.
Volatility of the trading range. Had not traded more than $100 in 90% of the minutes, but has had a huge volume spike in the past 10 minutes.
Trading range over lookback period was in the $10 range, but has been rapidly falling every minute since the volume spike.
Current trading volume. Is trading $100,000 in the current minute.
Broad Market Vix at 25 and spiking.
STOCK C:
Average trailing 5,000 minute volume. Has traded an average of $1,000 every minute for the past 5,000 minutes.
Volatility of the trading range. Had not traded more than $100 in 90% of the minutes, but has had a huge volume spike in the past 10 minutes.
Trading range over lookback period was in the $1 range, but has been rapidly rising every minute since the volume spike.
Current trading volume. Is trading $100,000 in the current minute.
Broad Market Vix at 15 and steady for months.
Imagine I’m trying to ‘sell’ these stocks.
I would guess that slippage is likely to be ‘negative’ with Stock C (I can make money and sell the stock for a higher price then in the current minute), very low for Stock A and very high for Stock B. What do these ‘boundary conditions’ this tell us?
Now, how would we rank these stocks in terms of the ‘slippage’ likely on trying to buy?
Stock C will have huge slippage. Stock B may have negative slippage and stock A will likely be very low and symmetrical on the buy and sell sides.
So, all this is simple to show that looking at the volume in the current minute is likely to produce VERY inaccurate slippage calculations.
To be highly accurate, we need to model what the overall market is doing. At a minimum, we need to look at some trailing multiday volume, price, volatility and trend/direction of the underlying - with some additional penalty for periods of very high overall broad market volatility.
If our model doesn’t take all of these into account and doesn’t draw on a huge data set and doesn’t differentiate buy side and sell side slippage based on a variety of market action, it is likely HIGHLY INACCURATE… and we should be HIGHLY cautious in relying on it - meaning that very high turnover systems (especially high turnover systems in lower liquidity stocks, and stocks that have really high vol, really rapid recent changes in price, or very uneven trading ranges) will have HUGE inaccuracies in their backtests.
That really leaves a few options:
a) Get the data sets needed to build highly accurate models
b) Make a simple model that is VERY conservative until we can do a).
a. For this simple model, longer-term lookbacks on the underlying matter. These models should likely include the following:
i. VOLATILITY MODIFIED LOOKBACK PERIOD VWAP.
1. The average of the 50 day average price*volume (VWAP), 10 day VWAP and the minimum 1 day VWAP value over the trailing 30 days on the stock (Higher values lowers slippage).
ii. Price volatility of the underlying over X-period lookback. (Lower values lowers slippage)
iii. RECENT PRICE ACTION.
1. Some form of EMA over the most recent 30 minute trading volume (Higher values lower slippage).
iv. Extra ‘slippage’ penalty during periods of market volatility with VIX over some threshold, say 25. Zero if VIX is lower then this.
It’s likely that the above four can just be put into a ‘slippage calculator.’
If you lack accurate data and can only use 2 of these and a 'guestimate', I would say that a) and b) likely matter more then the others. These seem more likely to be getting at how the algos of ‘market makers’ and various market participants are making their ‘fill order’ decisions. Not having a ‘volatility component’ seems a real mistake.
Until a long period of real data exists, covering 1 full market cycle with periods of huge vol (so at least 5 years), the defaults for Quantopian should be conservatively high with the ability for users to manually set them in their own sims and trading systems.
People can do what they want in their private trading, but for the fund the above seem to make sense.