Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
thinly traded stocks - why no gaps?

I wrote a simple algorithm to see if I could find data gaps:

# http://seekingalpha.com/article/123969-five-thinly-traded-stocks-with-plenty-of-upside  
def initialize(context):  
    context.stocks = [sid(3722),sid(33054),sid(16812),sid(5107),sid(1675)]  
def handle_data(context, data):  
    for stock in context.stocks:  
        if stock not in data:  
            print stock  

However, the log output is:

This backtest didn't generate any logs.

Is it possible that all five "thinly traded stocks" traded every minute over the entire backtest? Or does the backtester skip tics if data is missing? Or is the backtester filling missing data?

Any insights?

Grant

24 responses

Hi Grant,

When a stock doesn't trade in a minute or day, we forward fill. However, each item in data has a datetime property, which you can compare to the algorithm's current time. If the event's datetime matches the get_datetime function's return value, the bar is current. If the timestamp is less than get_datetime's return, the bar is old. The attached backtest demonstrates this check.

thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Fawce,

Some questions:

  1. It appears that you also forward fill the volume (see the output of the attached backtest). Does this make sense? Shouldn't it be zero?
  2. My first impression is that the filling could really muck up the simulation of order submission and filling. I thought the idea was to only allow an order to be submitted when there is a recorded market event? And then fill the order upon the next recorded market event?
  3. In the case of a single security in the backtest, is there forward filling?
  4. Your help page says that the get_datetime function "Returns a Python datetime object with the current time in the algorithm." But, as I understand, the backtester is event-driven, so "current time" could be misinterpreted. Even in the case of multiple securities, there could be a gap in time, if all securities are missing historical data simultaneously, correct?
  5. Is there any way to turn off forward filling? If so, what is the impact on order submission/fulfillment?

Grant

Hello Fawce,

Another question...how does the slippage model interpret the filled data? It is aware of the filling, treating the volume as zero?

Grant

Hi Grant,

I want to explain the current behavior and answer your questions, but I want to say from the beginning that we are open to advice for revisions to the behavior.

There are a few things that will hopefully make the system design more clear, and also make it easier to answer your questions.

Source events, like trades, data rows from fetcher csv files, splits, and dividends are all merged into a single stream of events, sorted by datetime (we have a tie breaking scheme to ensure deterministic sorting).

Each individual event is fed first through our slippage model and then through our performance tracker. Those componenents are designed to accept single events in a long series. Because the data is just a series of events, if a stock doesn't trade, the components simply don't receive an event and nothing happens. Also, all the internals receive the data before the algorithm, so the data can't possibly be altered in a way that affects the simulation results.

In an attempt to simplify algo coding, we decided to aggregate the trade events from across your security selection (whether a manual selection in code, or a set_universe call) before calling your algorithm. The benefit is that your algorithm is invoked at most once per bar (day or minute), and your algocode doesn't need to perform any book keeping to aggregate the most recent events. That's why the data parameter to handle_data is a dict-like structure keyed by sid.

The design question was what to do when one or more of the stocks in your universe doesn't trade in a given bar. We considered three alternatives:

  • remove the non-trading sid from the data parameter's keys
  • leave the key and set the value to None
  • leave the key and retain the most recently received trade event in the value

We opted for the third, mainly because each value in data has a timestamp. This was a pretty early design decision, and I assumed that having the last bar would not cause trouble because algos could filter by date.

We've always described the behavior as "forward filling", which has lead to some misconceptions about the old trade data being used in simulation internals like slippage and performance. The truth is this was simply a convenience we added for the algo API.

Hopefully things are at least starting to make sense. Let me answer your questions directly:

  1. We aren't exactly forward filling. We're just not updating the entry for a stock that didn't trade in the bar. The date of the bar information distinguishes old bars from new ones in the same data parameter. That being said, I'd like to consider replacing old bars with zero volume bars with OHLC equal to the prior close (a degenerate bar). Especially if we get confirmation from other community members that this is preferred. Honestly, I like it because I increasingly feel the reiterated bar is a "lie" about the current state of the algo's world.

  2. It would, but that's why we have an event based simulation that processes a single event at a time. Missing data is literally and figuratively a non-event. Orders are only filled starting with the first bar after the order is placed.

  3. Good question. The answer is no. More generally, if none of the securities in your universe trade in a given minute, handle_data is not called for that minute. In the internals, we've found that zero events in a given minute makes coding significantly more difficult. We could consider invoking handle_data every minute the market is open now matter what. We would send a degenerate bar (as described in #1 above) on every minute, for any stock that didn't trade. Thoughts?

  4. You are right that our description isn't precise. get_datetime returns the time in simulation at which handle_data was called. The return of get_datetime is exactly equal to max([trade.datetime for trade in data.itervalues()]) -- i.e. it is the timestamp of the event that triggered this call to handle_data. It is entirely possible that several minutes or days elapse between calls to handle_data, and this would be fully reflected in both the datetime properties of the trade bars and by get_datetime.

  5. Not right now, no. As I explained above, the aggregation of trades into the data parameter has no effect on order submission or filling.

  6. The slippage model doesn't receive the aggregated data paremeter, it receives individual events. Orders will remain unfilled until real trades are sent to the slippage model.

Thanks again for taking the time to ask these questions. I hope I've answered them sufficiently, and I look forward to more feedback!

thanks,
fawce

Thanks Fawce,

I appreciate your thorough response, and will read it later today when I have more time.

Best regards,

Grant

Fawce,

After a quick read, it seems to make sense. Regarding order management, if I understand correctly, with the forward filling, it allows an order to be submitted, but the order does not get filled until there is a market (the next historical bar). I'm guessing that this is how Interactive Brokers operates...they'll accept an order any time, but it won't be filled until there is someone on the opposite side of the trade.

Setting the volume to zero when it was actually zero makes sense, but it would be prudent to hear from others, too.

Regarding filling when there is only one security, you might consider it from the standpoint that presumably IB will accept orders even when there is no historical data. The work-around is to add a dummy sid (e.g. SPY), but this is kinda inelegant.

I'll send more feedback as it comes to me. I'd be interested in what others think, as well.

Grant

Hello Fawce,

I'm confused by your statement that "the aggregation of trades into the data parameter has no effect on order submission or filling." It seems that it does impact the submission, right? Without the aggregation, some bars would be skipped (as is the case for a single thinly traded security) and therefore orders could not be submitted. Or am I missing something?

Grant

Hello Grant,

I think - and I may be completely wrong - that because a 'brought-forward' price has an old timestamp it will never trade. If that is the case then it's not a problem for trading but may be an issue for indicators based on volume or frequency of trades.

There is an interesting consequence to this recent revelation in that 'fetcher' has now got quite useable. There is no need for Daniel S., Christian B. and others to massage their data to overcome the 'fetcher' brought-forward data issue. See: https://www.quantopian.com/posts/accessing-data-via-fetcher-fund-cloning-algorithm

P.

It actually has come up a few times here in the forums, but it didn't resonate before like it has this time. It's one of those things that was obvious to us from our perspective, but it clearly isn't documented sufficiently. I'll include it in a future update.

Ironically, one of the reasons no one really notices the choice is because it makes backtesting much easier. The things that people stumble on is when the algo blows up. By carrying forward the best-available price, the algo keeps chugging along. We also thought about it quite a bit before we did it. In almost all cases, it's the right behavior - even if you can't trade it this bar, it has a value that needs to be in the bookkeeping, and that's what is carried forward.

The thread has been useful. It will improve the documentation and future implementation changes. Thanks, as always.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Grant, yes thank you for clarifying. In the case where all securities in the universe do not trade, there is no event, and no call to handle_data.

Thanks Fawce and Dan,

Good move on planning to add this feature to the documentation. You might also consider posting some example code showing how to handle the various cases.

For small-time, individual traders, it might make sense to learn how to deal with thinly traded securities, since presumably they are of lesser interest to institutional investors. And they are not that uncommon...I came across the issue by tinkering around with Nasdaq 100 stocks at the minute level.

In the back of my mind, I've been trying to envision a backtester architecture that would manage missing bars better. I think that one would need a kind of multi-thread/state system, so that each security is handled separately. It sounds like you have this going on under the hood already, but perhaps it could be emulated even with a single call per market tic to handle_data for the aggregated bars.

Any consideration of allowing users the option to turn off the aggregation (but leaving it on by default)?

Grant

The aggregation of events into a single timestamp, and processing them together, is pretty inherent to the system. It's all in Zipline if you want to get deeper.

I think the key question is, what would better management of missing bars look like?

  • Being able to place orders even when the stock doesn't trade

Other than that, what is missing? I agree that a good backtester needs to handle stocks that have data every bar, and stocks that don't have data every bar. In general, I think Zipline does that.

Hi Dan,

The backtester should, with high fidelity, simulate actual trading, right? Have you gotten any feedback from the folks at Interactive Brokers (IB) to see if there are gaps between the backtester and what will transpire under live trading? Will they accept an order, even if there is no current market activity in the security? Will Quantopian hold off sending the order to IB until there is activity?

One idiosyncrasy is that there is no way to force a call to handle_data every minute the market was open, other than using a market proxy security such as SPY. The risk is probably low, but what if SPY doesn't trade for certain minutes? There could be a more sure-fire approach.

Grant

We don't have any feedback about gaps between the backtester and live trading. We need a lot more data than we have.

IB will accept an order at any point, regardless if there is trade data. I readily agree that we should be able to place orders even when a stock doesn't trade. When that behavior is changed, it will obviously obviate the need for adding SPY or any other mechanism - the clock will drive the handle_data(), not a data source.

The point that I'm trying to make is that, with the one noted exception, that our backtester is pretty good with handling the real-world gaps in trading activity. I'd even go further and say that the noted exception is unlikely to be a problem; the only time it would affect your live algorithm is if your algorithm has hard-coded timestamps or activity delays, or you're using preloaded, minute-stamped Fetcher data. Otherwise your algorithm will always issue the orders you expect it to.

I'm also trying to ask if there is something that I'm missing, so that I can make sure to correct it.

Thanks Dan. I can't think of anything else not already touched on in this thread. --Grant

Hi Dan,

Your comment "When that behavior is changed, it will obviously obviate the need for adding SPY or any other mechanism - the clock will drive the handle_data(), not a data source" stuck with me. Are you planning to add a clock?

Grant

Hello Grant,

I think Dan is saying orders will be placed on the next minute bar even if there isn't a new price at that bar. And that puts me in mind of a very interesting question which actually wasn't addressed at the time:

"If everyone trades wall clock minute cut off there will be burst of volume of trades either to buy or sell. This information might be able to exploited by games if the bars are not offset by some off set. The strategy should be able to provide offset from the wall clock on which bars are to be used. This risk will be there when there are large number of Quantopians trading."

See: https://www.quantopian.com/posts/ability-to-specify-offset-of-bars

(I'm going to confess to having a massive conceptual difficulty here. I have no problem with backtesting on minute bars. I have no problem with charting (outside of Quantopian) on minute bars. But I have a difficulty with the idea of trading minute bars 'on the minute'. I don't see intra-minute as HFT and I believe IB are providing data natively at a granularity of 100ms to 150ms. I feel there will be many price moves within 60 seconds for a liquid stock i.e.

"For real HFT, IB is absolutely not going to work. The prices in IB update a max of about 7 times a second."

See: http://quant.stackexchange.com/questions/325/is-the-interactive-brokers-api-suitable-for-hft)

P.

Hello Peter,

With regard to timing and slippage, it's kind of a convolved, confusing mess to me at this point. As I understand, the Quantopian data feed for live trading with real money will not be IB, but some other source. My sense is that there won't be any mutual Quantopian/IB, synchronized wall clock, but I could be wrong. Clearly, there will be messages transmitted from IB back to the Quantopian algo, with some latency (e.g. order status). How everything will transpire in time is kinda murky. I'd expect that IB will fill orders according to their independent capability to fill them; the Quantopian data won't influence order execution at IB.

My recommendation to Quantopian would be to put together some clear diagrams that show how everything will work.

Grant

Hello Grant,

I remember now that prices are from Nanex's NxCore and that Quantopian constructs minute OHLCV bars from that which means the incoming data is by definition intra-minute. IB are then the broker/intermediary and the trade is potentially executed on one of many exhanges as I'm sure IBs 'SMART' routing will apply.

I can't see every algo operating asynchronously so I assume there will be a 'synchronised wall clock' (because there is only one Quantopian database) and every algo will trade - subject to latency - at the same second. I'm guessing.

P.

Grant, I'm always a fan of making things more clear, whether it's documentation or graphics or whatever.

Slippage and timing are very different, and only lightly interact with each other. I'm not sure what part of slippage is confusing at this point - let me know.

As for timing, I just don't think it's a big deal at the pace we're talking about. We're processing a bar every minute. Data comes in, orders go out, order acknowledgements come in. Later, order "fills" come in - generally before the next bar, but not always (though, there is no ambiguity - it is either filled, or not, and never "maybe" filled). The next bar comes in. Repeat.

The synchronized wall clock of all this is clock on the Quantopian servers, the clock on the IB servers, and the clock on the market servers. They are going to differ by many milliseconds. They will never vary by a second. They will never vary by 60 seconds!

I think it's a bit like if you and I were talking to each other in the same room. The speed of sound, the speed of our brains are affecting how fast the conversation happens. But when you are standing next to each other, the speed of sound is negligible. Same thing here. When you process event every minute, you can reasonably expect your clocks to be synchronized, and any a half-second here and there doesn't affect anything.

Thanks Dan,

I'll see if I can articulate my question better, which probably stems from my lack of understanding of the mechanics of trading.

Your description of the various wall clocks is what I'd expected, and I agree that at the minute level, it would seem that the offsets and communication latencies shouldn't be a problem.

Best regards,

Grant

Hello,

In reading through the forums I came across this topic.

Was Quantopian ever changed to work off of a clock instead of data? This is really important to me because I buy/sell many thinly traded stocks that often have minutes of no activity, yet I still have to update my indicators for those periods.

Hi Andrew,

A work-around is to include an ETF like SPY as a sid in your code (you don't have to trade it). Then, handle_data will be called for every minute SPY trades.

I suggest writing a short script, just to make sure this is still the case, since there are tweaks to the functionality sometimes.

Generally, I agree that there should be some way to force a minutely call to handle_data when the market is open.

Grant

Thanks Grant. That's a great idea.