Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Large-scale probability distribution possible in quantopian?

I'm curious what the probability distribution is for consecutive up or down candles in a price data set for SPY. For example, if I'm looking at 5-min candles, what is the probability of seeing two up candles in a row? Three? Four? Five?

It would also be interesting to see these probabilities represented as a continuous dataset with different weights, similar to a weighted moving average.

I'd like to try to run this analysis in Quantopian, but I'm not sure if it's possible, as I'm fairly new to Quantopian.

I'm familiar with the record function, but that only allows one datapoint per key per trading day.

Any help here is greatly appreciated.

3 responses

The place to do this type of analysis is in notebooks. What is the probability of seeing two up bars in a row? Three? Four? Five? Start by fetching prices using the get_prices method. One also needs to decide what window, or how many days of prices, one wants to look at.

Attached is a notebook which steps through an analysis.

Anytime one is analyzing data it's good to have an idea of the expected results. That way one can get a 'gut feel' if the results seem correct when building a notebook. So question... what is the expected probability of having a single 'up bar'? Since returns typically follow a 'random walk', my guess would be 50%. Just like tossing a coin, the chances of the next bar being 'up' is 50%. So the probability of 2 up bars in a row would be 50% x 50% or 25%. This of course is premised on our assumption that gains are random. Take a look at the notebook to see how actuals line up with that expectation.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

This is fantastic! Thank you Dan! Incredibly helpful!

Thanks Dan. Your notebook is important for what it demonstrates. And maybe, we should accept some of its implicit conclusions.

Runs test have been around for quite a while. Probably, prior to Pascal. They are part of randomness tests in variables.

In an old 1972 book (Risques et Profits à la Bourse), the hypothesis and demonstration was made that revenues, profits and other fundamental data follow about the same runs distribution as in your notebook.

If you try adding more runs in a row (gain_6 and gain_7), you will get the same distribution.

If you add years to the mix, again you will get the same distribution.

And if you increase the time period from 5T to 390T and beyond, you will get the same distribution again. About half the samples will be close to 0.50 for a run of 1 and about 0.25 on the run of 2, and so on.

If you use stocks as proxy, you will again get about the same results. Looking at the long term (2002-2020), even if the distribution remains about the same, you can see in the probability chart, that the rolling-mean is fluctuating around its expected value (as if biased one way or another), and at times for quite a while. But nonetheless, remaining close to their expected values.

We tend to ignore the random nature of what we are dealing with and will use about anything in trying to forecast future prices and yet, the data says that what we see is almost random, not totally random mind you but almost random. And this has its own set of consequences.

Only a game where the odds of most things are close to 0.50 can allow the diversity of raw data interpretations we see since even if we are wrong, we will still be right about half the time.

That is the beauty of having trading systems where almost all opinions can demonstrate in some way that they are right about half the time, and this even if they are wrong.

My conclusion is: we need to design trading systems that will show some alpha even under those conditions. Because those runs test charts say we are dealing with a zero-sum game and that has it own conclusions too.