Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Equivolume Bars/Hurst Exponents

Just the result of this afternoon's research into whether there's any modelling benefit in trying to use equivolume bars in Quantopian. The idea is generally that markets move according to a "volume clock" or "trading time", and so a lot of the stats we'd like to calculate work better when sampling at equi-volume or equi-trade-count intervals, rather than homogeneously in time.

This is me trying to create equivolume bars in Quantopian, and then test them for better normality (not really), or see whether it affects the Hurst exponent (not much). It might still be useful for a cross-sectional mean-reversion, I am working on that.

10 responses

Thanks for the post Simon. Very interesting.

Why does the volume versus time curve have a "smile" shape? I figure it must be mostly driven by automated trading algos by big institutions, right? So, collectively, why would they program their algos to trade more at the beginning and end of the day?

Also, it would seem that volatility should be taken into account, too, in determining equal-volume bars. Maybe in fitting the "smile" the fit needs to be weighted in some fashion by the volatility?

Wouldn't it be more useful to capture where the money is going? If the volume is just Stock A for Stock B, and A & B are basically the same, who cares? But if it is Stock A to cash, then the associated trade volume is more interesting, no?

I don't think it has anything to do with algos, volume throughout the day has had this pattern for years. Personally, I believe it's to do with the fact that morning is the first chance to trade on things that happened overnight, and afternoon is the last chance to trade on things you think will happen overnight. They are inherently more active, not to mention that many people use market-on-open and market-on-close orders, which themselves lead to ancillary hedging orders and so on.

You could also do a similar analysis with volatility or range as the proxy for trading activity, accumulating bar data until a certain range has been traveled. This is the essence of point-and-figure charting I guess. Here I used trading volume as the proxy for trading activity.

My goal with this is do mean-reversion trading on bars which are as comparable to one another as possible. The next step, for me, is to calculate the aggregated bars using not the closing price of the final bar, which is a bit random, but the VWAP of that bar, and check out the statistics of that time series. There is still the problem of overnight gaps, I am not sure what to do about that, ideas are welcome. I am wondering if I should make a synthetic bar from 16:00 -> 09:30, which if the aggregation period is roughly 1/5th of a day, might make sense. This would correspond to the hypothesis that stock prices are some hidden process which we sample throughout the day, but which still going on at night though we cannot observe it.

FWIW if you could track the flows of money from stock to stock, that would be a golden ticket. Not sure how you would do that though...?

Simon,

Perhaps irrelevant, but here's a heatmap that indicates (kinda surprisingly) that even for a big ol' ETF like SPY, there are extreme volume events, relative to a 390-minute trailing window. Naively, I'd think that SPY would stay within a more narrow z-score range.

Also, if you stare at the heatmap in just the right way, an image of the Mona Lisa will appear. : )

Grant

The afternoon ones are probably macro news/Fed announcements?

The 10am spike is a persistent one, I am not sure off-hand what drives that.

Continuing to work on this - I decided to try and see what happens when I aggregate bars based on accumulated variance. Seems legit, but I am still having trouble deciding how to handle overnight gaps when making intraday systems.

If I were to make a day-trading system, there might be an argument for using open->close returns, then things are pretty easy, though it would still be tough to stitch together multiple days (but not impossible) for fitting spreads or whatnot. On the other hand, if the overnight gap is exactly the spread divergence we want to fade, it sort of makes sense to try to include it, though it starts to force us to certain aggregation sizes (ie 65 or 78 minutes aka 6 or 5 bars per day) just to absorb the burst of overnight variance.

Am I making any sense? It's late...

I am still having trouble deciding how to handle overnight gaps when making intraday systems

One has to figure that big-time traders think in terms of 24/7/365, and try to develop models that fill in the overnight/weekend/holiday gaps in some fashion, by plugging into other data feeds. With only 390 minutes to work with, you are basically dealing with a relatively low duty cycle burst of information (~ 30% duty at best). In my tinkering, I typically over-smooth the data, using something like:

prices = pd.ewma(prices,span=390)  

So, at least for overnight gaps, the smoothing sorta washes out the gap (if the market is closed for 3 days, maybe not so much). There's probably a more sophisticated way of smoothing, that better accounts for the gaps.

As a side note to the Q team, my sense is that it would be nice if you could give users individual control over the data injestor (or offer options other than OHLCV bars). Then, one could take in all trade data each minute, and spit out whatever would be be best for a given strategy. Is this at all feasible? Spitting out OHLCV bars is kinda crude, when something more sophisticated could be done (presumably all in code, with no hardware changes).

I think the overnight gap is contributing too much variance for this weighting, I might try rebucketing with mean absolute deviation instead. Basically, what I want is some resampled time series where each bar is more comparable to it's predecessor/successor, given that trading action is not stationary in time. I'm trying to reduce some of the intraday seasonality in volatility by doing non-homogeneous aggregations to more evenly distribute "action" between bars.

But, the overnight gap is large compared with any subdivision of a single day, that it skews everything.

Perhaps I need to break this down even further, with different buckets for different days of the week, so that we might expect something like the following:

Monday morning first bar - 1 minute long, to handle all the post-weekend action
.... Wednesday morning first bar - 10 minutes long
... Wednesday afternoon last bar - 10 minutes long
... Friday afternoon last bar - 5 minutes long