Split-Adjusted Prices Introduce A Look-Ahead Bias

posted

Q Meta

Perhaps the topic has already been discussed, but...

Why aren't unadjusted prices available in the backtester?

I see some problems with the current implementation:

1. Look-ahead bias

By restricting its users to split-adjusted prices only, Quantopian introduces a systematic look-ahead bias.

Indeed, when using split-adjusted stock prices, low priced stocks tend to do better than high priced stocks just because, by definition, stocks that did well over the years are more likely to experience splits than stocks that didn't. Thus, their split-adjusted prices converge to zero going back in time and the result is an artificial predictive power of stock prices (i.e., low > high).

2. Unrealistic filled prices in the blotter

In the transaction details I realized that filled prices are actually split-adjusted prices and not raw prices. Since I'm using IB's default commission model (price per share with a minimum of $1 per order), I guess I'm getting different results than in real life. Things add up quickly.

3. Filtering by actual prices is vital for some strategies

Generally, commission costs decrease with stock prices for the same dollar position. For strategies with a low average return per trade, it is important to avoid trading certain stocks when their prices are too low. E.g., at $0.005 per share, buying $100,000 of a $10 stock results in a $50 broker commission. It doubles for a $5 stock. The ability to filter out low priced stocks is vital for certain strategies.

A better approach?

In backtesting, using split-adjusted prices is fine as long as one is using the current price at the time when the decision was made. For example, the backtester should see the price of AAPL as $645.57 on 2014-06-06 and data prior to that date would be split-adjusted. On 2014-06-16, the price seen by the backtester should be $92.20 and at that time the price of 2014-06-06 would be split-adjusted (i.e., the initial price of $645.57 divided by 7).

28 responses

In addition, the API documentation is misleading since we can read:

Our US equity set is point-in-time, which is important for backtest accuracy. Since our event-based system sends trading events to you serially, your algorithm receives accurate historical data without any bias towards the present.

This statement is false - there is a bias.

In a backtest, the split is already applied retroactively to all data.

This is true, but incomplete. You should stipulate that in a backtest, future splits are also applied to the current price.

Alexis, thanks for the post. I think you're right about a couple of the limitations, but I don't agree that there is lookahead bias.

1 Lookahead bias

There is clearly lookahead bias if you can look at a price and know that there is a split coming. For instance, if you during a simulation you have an as-traded price and a dividend adjusted price and they are different, you know there is a split coming. That concept gets some discussion here. You're suggesting something a lot more subtle: "In a split-adjusted world, lower prices tend to go up, because they indicate pending splits." I have two thoughts in reaction to that. The first is, I don't see a lot of quants who are using the actual price and only the price to make purchase decisions - they're looking at market cap, or price movements, or more broadly, the price as it relates to some other factor. My second thought is that low prices are also what you see just before a company goes out of business. You can't look at a low price and expect a dividend.

2 Prices at execution

You're right, for stocks that split a lot, we're overstating commission costs in the early years of backtests. It's on the list of improvements to be made, though we haven't done it yet. The good news is that it's generally an error on the conservative side, rather than than aggressive.

3 Filtering on actual prices

I agree, this is a limitation. I don't see it too often, though. I see people using market cap, or changes in price, or gaps, or price baskets, but I don't see a lot of strategies that simply look at the naked price.

If you think #2 or #3 is limiting your strategy choices, one option is to set the commission to zero while you're doing your initial testing. You can then add it back in during later testing.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Dan,

Thanks for your comments and the link on dividend-adjusted prices. That was actually the next topic I wanted to discuss! :)

Regarding splits...

There is clearly lookahead bias if you can look at a price and know that there is a split coming.

It's a statistical bias - low priced stocks tend to do better than average when prices are split-adjusted vs. unadjusted.

Here is a link to a chart that comes from a DB Market Research paper: here. It illustrates the bias I am talking about that "the adjusted price has future information".

For instance, if you during a simulation you have an as-traded price and a split adjusted price and they are different, you know there is a split coming.

Correct, but this problem is only the result of how Quantopian handles splits. You adjust prices for splits once for all. For example, prices in 2003 are adjusted for splits coming in the years 2004-2015 and are fed to the backtester. This is the source of the look-ahead bias. In a true point-in-time database, it should not be possible. Splits should be treated as events (like dividends) and handled when they happen. That way, at any point in time, the backtester only sees the current price at that time (which is the unadjusted price) and all prices prior to that time are adjusted for splits and/or dividends. This is what happens in real life. The price I'm getting today is not adjusted for splits coming in 2018.

I don't see a lot of quants who are using the actual price and only the price to make purchase decisions - they're looking at market cap, or price movements, or more broadly, the price as it relates to some other factor.

I identified a look-ahead bias and am reporting it here. When it comes to backtesting, I believe it is important to remove as many biases as possible as there are just so many ways to get unreliable results out of a backtest (without even knowing it, of course).

Looking at what other quants are doing is sometimes dangerous. For example, I'm glad to hear that Quantopian is going to make dividend-adjusted prices available in the history function. Because how many quants do you think have been comparing apples to oranges without complaining (or acknowledging) about it for about 3 years since Quantopian launched its platform? Many, I believe:

When you compute returns on split-adjusted prices of a stock that pays no dividend, you get a total return stream. Not paying dividends is equivalent to reinvesting dividends for a company that pays dividends. But when you compute returns on split-adjusted prices of a company that pays dividends, but do not include them in your calculation, you don't get a total return stream. Comparing both streams is like comparing apples to oranges and that's what some quants have been doing here for a couple of years. This is especially important for momentum-based strategies that require total return streams. So it's great that this issue will be fixed in the future.

The question remains: why is the unadjusted (close) price not available? Making it available would at least solve problem #3.

Thanks!

PS: I hope all of this makes sense, but as English is not my first language if a passage is not clear please let me know!

Simon Thornington

You can get a rough unadjusted price by dividing fundamentals market cap by fundamentals shares outstanding. FYI.

You can get a rough unadjusted price by dividing fundamentals market cap by fundamentals shares outstanding. FYI.

Thanks for the idea. Unfortunately, fundamental data is not available for all securities, I believe, so that is a rather limiting option.

Why not just add an unadjusted_close column to the pricing data? It doesn't fix the look-ahead bias but allows a better evaluation of commission costs.

That link to the DB Market image is interesting. I'd like to find that full paper to see what else was being considered in there.

Alexis, the "why" question is one of those hard ones because the answer isn't straightforward. Why use split-adjusted prices? Because it makes a lot of the computation and simulation easier, at least when we started writing Zipline. Why not provide unadjusted prices, in addition? Because that leads to a worse look-ahead problem (as we both noted) that occurs when you compare adjusted and unadjusted prices.

One path, perhaps, is to do as you suggest and process splits as events. That may be something we will do in the future. It's definitely something we're doing in our forthcoming trading universe selection framework. However, rewriting the backtester to use unadjusted prices hasn't been a highly requested feature at this point.

As a side note - fundamental data is available for all companies, just not available for ETFs.

Disclaimer

I agree that handling split and dividend adjustments is not straightforward. There are multiple ways to adjust prices with different jobs in mind: total return stream assuming reinvestment of dividends, total return assuming no reinvestment of dividends, etc.

I understand that a decision that made sense at some point doesn't necessarily so later. But adding an unadjusted close column doesn't seem too hard and make things easier than using fundamental data. If someone is crazy enough to build a predictive factor out of the adjusted/unadjusted prices, please raise your hand!

As a side note - fundamental data is available for all companies, just not available for ETFs.

That's cool. I thought I've read somewhere that they were available for 5,000 companies only. I'll try Simon's suggestion. Thanks.

Tristan Rhodes

I had my first realization of the impact of split adjusted prices when I was backtesting a VXX strategy. I was using a small portfolio of $10,000 and quickly discovered that I could only buy 1 share of VXX in 2009 for ~$7,000 (current price is ~$20).

I agree that there is a statistical look-ahead bias for split adjusted prices. For instance, any strategy that filters out high prices will remove all "lemon" stocks (like VXX) that are going to drop severely and reverse split over the next years. Additionally, "superstar" stocks that are going to zoom ahead with large splits in the future will ALWAYS have a low price (Apple has a split-adjusted price of $0.91 in 2003)

At the same time, I can see how this is a complex subject. How will our algo logic handle price changes during splits? How about quantities of shares changing outside of our orders? How are indicators going to work with the drastically changing non-split adjusted prices?

So what can we do about this? At this point in time, all we can do is be aware of this bias and try to not make any decisions based on price. Commissions will be wrong as well ("lemons" will underpay commissions, and "superstars" will pay too much). We can choose to remove commission costs, but that also gives us imperfect results.

Any other ideas?

One option is to model trading as it happens in real life (ie, events stream). Splits are events. When they occur, the backtester needs to adjust open positions accordingly. If the history function is to return split-adjusted prices, then adjustments need to be made on the fly using only information available at the time (ie, no future splits). It would also make sense to adjust for dividends in the history function so that it returns a total return stream and eliminates the current bias which favors stocks that pay no dividend.

Grant Kiehne

From Dan's response above, it sounds like more information will be available once the "trading universe selection framework" API is released. It remains to be seen, but maybe the database (which I think is what it'll amount to) will contain the split info. as events so that one could "unadjust" the historical prices. Similarly, there would be data for dividends, I'm figuring.

Would it help if the backtester allowed fractional shares to be transacted, as an option? This would seem to be another unrealistic element to the simulation, since if I'm thinking about it correctly, there is an artificial "digitization" of transactions, if prices have been adjusted. I think that if the simulator uses adjusted prices, then you lose fidelity if only integer shares can be transacted (and for small amounts of capital, can end up with unrealistic "corner cases").

I stumbled across this problem inadvertently while working on a trading strategy that seemed to be producing very good results in backtests with starting dates from circa 2002 through 2006. After a while I began to suspect that my returns were not the result of any brilliance in my strategy, but were, rather, due to some type of look-ahead bias I couldn't quite understand. Eventually I narrowed it down to the problem described here. (in fact, searching to see if anyone else had come across it is how I found this article.)

I have a relatively short algorithm that demonstrates the potentially huge effect this problem has on the outcomes of various backtests. Being new to Quantopian, I do not know if it appropriate to post source code in this forum. If anyone can advise me as to the "how" and "where" of posting it, I would be glad to do so.

Luca

Hi John, you can post the backtest (together with the code) you like to share simply clicking on the "Attach" button in the upper right corner, visible when you post a reply.

Thanks, Luca.

The attached algorithm does that which, in my opinion, should not be possible. It randomly selects 12 stocks and invests about 90% of the Day 1 cash, divided equally among them. After that it just waits for any accumulated dividends to appear and when they exceed $2400 it allots equal dollar amounts of the dividends toward the purchase of more shares of the originally chosen 12. If one runs this algorithm against a backtest that starts circa 2002 it almost always beats the SPY total returns benchmark. I put some comments in the code to indicate what I think is going on.

That's a very well commented and explained example. Quite fascinating.

The good news is that the new Pipeline API makes the point moot. Pipeline uses only as-traded prices. If a split/dividend happens later, that adjustment is applied historically. But on the day you're setting your universe, there is no adjustment - it's as traded.

There are a few exclusions that you added - share classes and industry template code - that I'm not familiar enough with. I'm somewhat curious to know if they introduced some signal, too. I think it would be very interesting to reproduce this example using Pipeline and see if the result holds up.

Disclaimer

Luca

Thanks for sharing this John, it is really worth to spend some time looking at your test (by the way, great explanation in your code comments).

Luca

@Dan: Could you please clarify why Pipeline API should make the algorithm perform differently?

In the Pipeline announcement post we can read "All price data available to the pipeline API is split and dividend adjusted. ", so why should the result be different using the Pipeline API?

You say "Pipeline uses only as-traded prices." Does this mean that Pipeline itself use as-traded prices for internal calculation and then it converts them to split and dividend adjusted prices? So what kind of price does the user see?

Luca, check out this new post here. I think it has much more precise language than I used above, and should do a better job than my explanation.

Disclaimer

Hello, all. A friend of mine, Chris D., was kind enough to recast my non-Pipeline code into a Pipeline version (and also humorously called my findings "the Lake Wobegon Effect" because "all the children are above average.") The results are essentially the same as before. I now realize that my "explanation" for the effect was faulty -- it does not appear to have anything directly to do with the price quotations. The anomalous results, though, indicate there is probably either something terribly wrong with the backtest results themselves or with the parameters of my experiment. It seems to me I should not be able to regularly (85% of the time) trounce the SPY total returns by randomly selecting 12 stocks and holding them forever.

Can anyone offer an explanation for what is going on here?

Luca

@John, I don't believe the Pipeline version of your algorithm proves your thesis wrong. It correctly uses pipeline api to rank the stocks by market capital to simulate spy500, but it keeps using the spit-adjusted prices to decide how many shares to buy for each secority. That's the reason why you didn't see any different result.

I believe you are still right in the explanation of the great performance of the algorithm and thus the bias.

I modified (again) your algorithm to use Pipeline API but this time I get the security price from the Pipeline API (so they are not split-adjusted on the date we buy the stocks) and I calculate how many shares to buy using this not-adjusted price instead of the one provided from 'data'.

After some backtests I can see that the algorithm doesn't outperform SPY all the time. Also, note that the code prints the price obtained from Pipeline and the ones from 'data'. In this way, it is possible to see that when the random selection selects many stocks with pipeline price higher than price in 'data' then the algorithm outperform SPY. When there are not so many of this stocks or when the randomly selected ones have a lower pipeline price than the price in 'data', the algorithm doesn't perform better than SPY.

Luca, thank you so much for clearing this up! My most recent post will be removed once I am back in my office. I must have posted without refreshing the page first -- I didn't see your post until a couple of minutes ago.

This is a huge load off my mind. I can't thank you enough!

Alisa Deychman

You unfortunately ran into a common confusion in the backtester. There can be a big difference between daily mode and minute-mode backtesting. Daily mode is used to check the general trends of an algo, and to see if the code compiles. Minute mode is used to closely develop your strategy and ready it for live trading. I'd recommend to always use this setting.

The key difference is how often the data is sampled. In daily mode, an order is submitted one day at 4PM and filled the next day at 4PM. In minute mode, an order is submitted in one minute and filled in the next minute. This is a more accurate simulation and leaves less room for price movement. For a more detailed explanation take a look at this thread. I cloned your algo and without making any changes ran the strategy in minute mode. The results are below.

As you continue to develop strategies, consider your correlation to the general market. For the contest and hedge fund your beta should be between 0.3 and -0.3, so your algo is steady in all market conditions.

Cheers,
Alisa

Disclaimer

Luca

Ah ah ah, a lot of digging in to split-adjusted vs as-trade price theory and then it was only a matter of daily vs minute backtesting? :)

Thanks Alisa, I'll run some more tests in minute mode to compare the results.

Hi, Alisa.

Firstly, thank you very much for taking an interest in this issue! Any light you can shed is greatly appreciated.

I am still terribly confused about this, though. My algorithm (the version you ran, which preceded Luca's fix) is essentially a "buy once and hold forever" strategy with the only other buying being the re-investment of any accrued dividends, as they occur, apportioned in (roughly) equal dollar allocations toward additional purchases of shares of the original set of randomly chosen stocks. Even if the fills are occurring on a one day delay, doesn't it still boil down to the same experiment? I am not using any selection criteria at all in picking the stocks. I'm just randomly buying 12 different stocks. The price differential between the time of the dollar allocation (Day 1) and the purchase (Day 2) would generate some error in terms of the actual number of shares purchased, but I keep an approximate $100K (10%) cash reserve as a buffer to accommodate modest miscalculations. The actual magnitude of the errors would depend on the one-day price fluctuations, of course, but even a significant error would still give me a completely random starting point with the only consequence being that I might inadvertently be "borrowing" money (which is why I plot the leverage). Irrespective of the magnitude of such an error, though, it still seems like it results in a randomized selection, and so I wouldn't expect to be able to consistently beat the SPY total returns by making those kinds of random picks, investing about 90% of my cash (on Day 2, it turns out) and just holding those positions forever.

The example results you got from the "minute data" run above shows 12 randomly picked stocks beating the SPY at the outset and holding a lead throughout the entire backtest -- eventually besting it by an impressive 213% in returns versus 173% in returns. These results typify what was driving me to distraction; I never expected to regularly get such a result.

To be clear, I am not trying to be contradictory or argumentative. I truly don't understand why the results should be expected to be significantly different using minute bars as opposed to daily bars. (And I am fully prepared to believe that I have missed something completely obvious!)

Important note: I've been running Luca's fix to the Pipeline version for several hours now (still using daily bars) and so far it seems like he has indeed cracked the problem. I am seeing the kinds of varied results and crossovers between the algorithm's returns and the SPY's returns that I would expect to occur in reality. In the 10 trials I have run so far, only 5 have beaten the benchmark. (This is still a higher percentage than I would normally expect but 10 trials is a small sampling -- more to come.)

In any case, Luca's change has made a dramatic difference in the results.

Luca

@John, while I was checking my code again and running backtests in minute mode I digged the forum and found this: https://www.quantopian.com/posts/why-random-portfolios-appear-to-outperform-benchmarks-dot-dot-dot

It seems related to what we see in the algo.

Richard Prokopyshen

This may be a question for Josh at Quantopian, but I wonder if the fundamentals/morningstar data that is being sampled from in 2002/2003 is complete. Say for example morningstar data has a 2010 existence cutoff. This meaning only if the company existed in 2010 then there is data available for that company in 2002/2003 and forwards. I am thinking your sample may have survivorship bias. If you get to pick companies in 2002/2003 from a sample of those that are still around in 2010 that would explain your findings. And would also explain why you need such long runs.

Josh Payne

Hi Richard,

We've taken care to try to avoid look-ahead-bias as you've described here. If the company goes out of business in 2010 but was public as early as 2002, we'd have data for that company from 2002-2010 in the fundamentals database.

One simple test that we ran on the database was something like the algo below which just checks the total number of available companies on any particular day. If you run it, you'll see the number of companies in fundamentals tends to grow over time but there are trends where the overall count reduces as well.

def initialize(context):  
    foo = sid(24)  
    pass

def before_trading_start(context):  
    # Import any fundamental areas you want.  
    val = fundamentals.valuation  
    inc = fundamentals.income_statement

    # Query the fundmental database - the result is a pandas datapanel  
    # The first several lines are defining WHICH fields we are requesting  
    df3 = get_fundamentals(query(val.sid, # this is the Quantopian security id  
                                   val.sid.label('ce_sid'), # this is how you relabel a field  
                                   val.market_cap,  
                                   inc.net_income,  
                                   inc.net_income_as_of, # by adding _as_of to a field  
                                   # name, you can see which period the main field is reporting  
                                   )  
                                   )  
    context.df3 = df3  
    log.info(len(list(df3.columns.values)))  
    context.company_count=len(list(df3.columns.values))

# Will be called on every trade event for the securities you specify.  
def handle_data(context, data):

    if context.company_count > 0:  
        record(count=context.company_count)  
    pass

Disclaimer