Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Trading volume data inconsistency

There may be a bug in the most commonly used Quantopian data for at least some stocks, or there is a gap in my understanding of how I am to use these data.

I've been trying to understand why I get so many partial orders with some strategies.
This naturally caused me to take a look at my trade sizes vs the historical daily volumes.
I do not understand the apparent inconsistency between data presented by pipeline and that presented outside of pipeline.

Here are some observations:
1) There is likely an inconsistency among price and volume data used by pipeline features like AverageDailyVolume() and that used to make/manage orders
2) The daily volume reported through pipeline and the normal interface do not match. The degree of mismatch is large and not constant in offset or multiple
3) Neither of the internally reported volumes match data published on yahoo.com

Attached is a backtest for a single stock [MTEX] from 7/1/2003 through 7/31/2003
The following transactions are made
7/1: 2963 shares ordered. 2963 purchased
7/8: -2963 shares ordered. 2963 sold
7/15: 5184 shares ordered. 1537 purchased 3647 shares were not filled
7/22: -1537 shares ordered. 1537 sold
7/29: 5509 shares ordered. 2239 purchased 3270 shares were not filled

Here is where my confusion begins.
The logged messages for total trading activity on the above trading days are below
These show that 6500 shares were traded on 7/15 and 3800 on 7/29.
Why then were my trades limited to 1537 and 2239, respectively?

2003-07-02 yesterdays_news:110 INFO yesterday MTEX traded 4600 shares at $ 7.41
2003-07-09 yesterdays_news:110 INFO yesterday MTEX traded 16400 shares at $ 6.00
2003-07-16 yesterdays_news:110 INFO yesterday MTEX traded 6500 shares at $ 6.62
2003-07-23 yesterdays_news:110 INFO yesterday MTEX traded 12300 shares at $ 8.49
2003-07-30 yesterdays_news:110 INFO yesterday MTEX traded 3800 shares at $ 8.06

Investigating further I find that the daily volume data in pipeline and the standard, nonpipeline interface do not match.
Note: that the data are for the trading date prior to the logging date, so they don't align with the data above
2003-07-01 periodic_rebalance:93 INFO Vpipe = 638965 Vquant = 29170
2003-07-08 periodic_rebalance:93 INFO Vpipe = 246630 Vquant = 46400
2003-07-15 periodic_rebalance:93 INFO Vpipe = 111950 Vquant = 3900
2003-07-22 periodic_rebalance:93 INFO Vpipe = 332663 Vquant = 10200
2003-07-29 periodic_rebalance:93 INFO Vpipe = 61250 Vquant = 6200

From yahoo.com I find that the actual trading volumes as shown below.
pipeline and nonpipeline values are show for comparison

2003-06-30 V(yahoo) = 64,900 Vpipe = 638965 Vquant = 29170
2003-07-01 V(yahoo) = 46,300 Vpipe not logged Vquant = 4600
2003-07-07 V(yahoo) = 26,400 Vpipe = 246630 Vquant = 46400
2003-07-08 V(yahoo) = 40,000 Vpipe not logged Vquant = 16400
2003-07-14 V(yahoo) = 13,000 Vpipe = 111950 Vquant = 3900
2003-07-15 V(yahoo) = 7,700 Vpipe not logged Vquant = 6500
2003-07-21 V(yahoo) = 43,500 Vpipe = 332663 Vquant = 10200
2003-07-22 V(yahoo) = 62,100 Vpipe not logged Vquant = 12300
2003-07-28 V(yahoo) = 6,200 Vpipe = 61250 Vquant = 6200 (6200 matches yahoo)
2003-07-29 V(yahoo) = 9,600 Vpipe not logged Vquant = 3800

This problem persists into the present with MTEX as seen in this data from 7/1/2016 through 7/31/2016

2016-07-06 periodic_rebalance:93 INFO Vpipe = 666 Vquant = 0
2016-07-12 periodic_rebalance:93 INFO Vpipe = 1926 Vquant = 893
2016-07-19 periodic_rebalance:93 INFO Vpipe = 34112 Vquant = 600
2016-07-26 periodic_rebalance:93 INFO Vpipe = 4474 Vquant = 161

6 responses

Hi Peter,

First off, the discrepancy between Pipeline and data.history data is resulting from the way data.history works when used during the trading day. As is explained in the documentation here, if used during the trading day requesting '1d' bars, data.history's last returned bar is a partial bar for the current day. So when you call yesterdays_news right after market open and look at the last daily bar returned by data.history, you get a partial bar which only contains data for the one minute that has elapsed so far on the current day. So the volume figure you get there is just the volume for that one minute.

To properly get data for the previous day, you have to use the second-to-last bar given by data.history. So here's how you get yesterday's volume:

data.history(context.test_stock, "volume", 2, '1d')[-2]  

So if you fix that code on lines 90, 107, and 108 you'll see the Pipeline-data.history discrepancy go away. As for your trade quantities being limited, our default slippage model limits trades to 2.5% of volume, and you'll find the order amounts that were able to be filled are about 2.5% of the volume for that day.

Finally, the discrepancy with Yahoo: Note that MTEX had a 1:10 reverse split on 2012-01-17. While our volume values reflect the actual share volume from 2003, Yahoo's volume values are split-adjusted, since they're intended to be used from the perspective of today. So Yahoo's volume values are about 1/10 of ours.

I hope this explains everything; let me know if you have more questions.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Nathan,
Thanks for taking the time to give such a thorough response.
Your fixes did resolve the difference between the pipeline and nonpipeline data.

Two concerns remain:

First, the 2.5% limit in the slippage model did not work exactly as I'd expect.
I modified my messages to display 2.5% of the day's trade volume.
The actual limits imposed are slightly less than 2.5% of the reported daily volume.

2003-07-15 WARN Your order for 5184 shares of MTEX has been partially filled. 1537 shares were successfully purchased. 3647 shares were not filled by the end of day and were canceled.
2003-07-16 yesterdays_news:126 INFO yesterday MTEX traded 66200 shares so max trade size is 1655
==> 1537 vs 1655 is about 7% less than expected

2003-07-29 WARN Your order for 5509 shares of MTEX has been partially filled. 2239 shares were successfully purchased. 3270 shares were not filled by the end of day and were canceled.
2003-07-30 yesterdays_news:126 INFO yesterday MTEX traded 96623 shares so max trade size is 2416
==> 2239 vs 12416 is about 7% less than expected

Second, the 10:1 split explains the general behavior (Quantopian volumes are ~10x Yahoo volumes)
The ratio is not consistent, however.
Google Finance volumes are very similar to the Yahoo data and are slightly higher. Perhaps they include trades outside of the session. Regardless of this the Quantopian/Google ratios would be similar to those shown for Quantopian/Yahoo.
I'm puzzled as to why the ratio varies so much

2003-06-30 V(yahoo) = 64,900 Vpipe = 638965 ==> ratio = 9.85, V(google) = 65,056
2003-07-07 V(yahoo) = 26,400 Vpipe = 246630 ==> ratio = 9.34, V(google) = 26,503
2003-07-14 V(yahoo) = 13,000 Vpipe = 111950 ==> ratio = 8.61, V(google) = 13,315
2003-07-21 V(yahoo) = 43,500 Vpipe = 332663 ==> ratio = 7.65, V(google) = 43,576
2003-07-28 V(yahoo) = 6,200 Vpipe = 61250 ==> ratio = 9.87, V(google) = 6,275

Nathan,
If understand you right,
While our volume values reflect the actual share volume from 2003
means that Quantopian historical volume and price data is not split adjusted.

Yes. I am puzzled too. You need to adjust both.

I am confused about Nathan's comment about split adjustment of data.

I ran a quick test that indicates all is behaving in a split adjusted manner.

Backtest:
buys 1000 shares of AAPL on 6/2/2014 and holds them until 5% return is achieved

Background:
AAPL had a 7-to-1 split over the weekend between 6/6/2014 and 6/9/2014
I picked AAPL for this because if there were a problem with it, then others would have yelled loudly in June of 2014

Logged data results:
The logged data show that this split is captured in the reported price and volume history data of Quantopian
2014-06-05 my_rebalance:36 INFO yesterday AAPL traded 8964095 shares at 644.82
2014-06-06 my_rebalance:36 INFO yesterday AAPL traded 7841964 shares at 647.35
2014-06-09 my_rebalance:36 INFO yesterday AAPL traded 67368599 shares at 92.23
2014-06-10 my_rebalance:36 INFO yesterday AAPL traded 64380213 shares at 93.70

Displayed data results:
The displayed data in Daily Positions panel of Backtest shows the adjustment is applied as well
2014-06-05 1000 shares at $647.35
2014-06-06 1000 shares at $647.57
2014-06-09 6999 shares at $93.70
2014-06-10 6999 shares at $94.25

??? What happened to one share? 6999 vs 7000?

Price-based decision results:
2014-06-02 1000 shares purchased
2014-06-11 699 shares sold
If the price data were not split adjusted, then the 6/11/2014 sale would not have been triggered.

For pete's sake (and that was not my first choice of expletives), how Quantopian handles adjustments has been repeatedly explained in great detail. It is more correct than what Yahoo et al do.

The key is that they make no adjustment that was not known at the time. But at each simulation time point, they adjust backwards all historical data requested at that time point as of that time point.