Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Can NaiveBayes tell us anything about Momentum Trading?

Hi all,

So I've noticed that some of the most popular algorithms shared are takes on momentum trading. As was mentioned in one of the threads, it is unclear whether momentum trading actually works. Academics have generally argued that it does not. So I decided to test if there is any data or insight to be gathered when making a trading decision today based on the past t-days' price movements.

For instance, assume that we are trying to determine whether the SPY index will go up or down tomorrow. Is knowing that the direction the S&P went today (or yesterday, the day before that.. etc..), useful in making our decision?

Obviously, there are many parameters you can experiment with - I've shared here one specific implementation. Specifically, I'm using Bernoulli NaiveBayes (http://scikit-learn.org/dev/modules/naive_bayes.html). If I am trying to predict date t, I use dates t-1, t-2, t-3, t-4, and t-5 as predictors.

Further, I define t (and t-1...etc) as binary variables (hence using Bernoulli as opposed to some other Naive Bayes).
The value 1 denotes upward movement, the value 0 denotes downward movement. So, for instance, if the price of SPY was higher at close today than at close yesterday, t-1 is marked as '1'. If the price of SPY was lower yesterday at close than the day before that, then t-2 is marked as '0'. etc.

I use a moving training window of 60 days (and the 5 previous movements for each of those days) to fit the Bernoulli NB model. I then use the latest 5 day price movements to predict what the price movement will be tomorrow.

If you the run the script, you will see that I am logging output each day. I'm also keeping track of the PNL (simply by calculating the spread you would have based on your previous predicted decision, buy or sell).

Most importantly, you will see that the rate of accuracy converges on around 50% as the program runs... this is rather a poor result and suggests this particular method of prediction isn't all that good. Why might this be?

  • the Naive part of Naive Bayes! - the premise of naive bayes is that each feature (eg. t-1 or t-2) influences the probability that the predicted value, t, is a particular value. But, it does so independently which could be the problem. For instance, the influence of t-1 on t might also depend on the value of t-2.
    • Are binary variables the way to go? Is a upward price movement the same as any other price movement regardless of magnitude? Probably not...
    • are we missing predictors? Should volume be accounted for too?
    • Can our time frames be optimized? Should our training window be shorter (implying the behaviour of the market changes more frequently) or longer? Is 5 days as predictors too many?

Feel free to experiment and share if you can! One improvement that I've tried and appears to noticeably is to shorten the training period to 30 days and use a Decision Tree Classifier instead (this solves the problem of independence).

Finally, this is my second post here on machine learning and I'm wondering.. am I contributing? I don't really have a sense of what most users' experience with machine learning is - so if I'd be more helpful explaining more please let me know! I've done my best to comment through my code as well

10 responses

Hello Alex,

I don't have experience with machine learning, but I'm interested in learning, as time allows. I've found the related postings on this site kinda fascinating. So, from my perspective, you are contributing. Hopefully others with more expertise can comment on your results and collaborate with you.

Regarding whether momentum trading works, it would seem to rely on identifying momentumish/bubblish stocks/market sectors. So, it is a matter of screening for winners. Apple stock is popular for backtests on this site...can anyone devise a screening algorithm that can find the next Apple?

Grant

I appreciate your posts, I just haven't yet gotten around to machine learning myself!

Hi Alex,

Thanks for sharing! We certainly appreciate shares using ML methods -- it's a very exciting field. On Quantopian I noticed a trend where algos using ML get a lot of views but not often not that many replies. Maybe the threshold to comment on more sophisticated algos is higher?

To your algorithm. I think it's a fun experiment. Certainly the accuracy of classification means that it didn't work; but that, I think, is to be expected. I also agree with you that the Naive Bayes assumption of independence (conditional on the class) is problematic here. One thing I'd be curious about is whether more recent time points receive higher weights in the classifier. In other words, are more recent time points more "predictive" of the current price movements.

I would also be interested to see the Decision Tree algo you mentioned. One suggestion would be to try the related Random Forests which seem to have better accuracy in many cases.

Your approach also reminded me of an older Moody paper (http://www1.icsi.berkeley.edu/~moody/JForecastMoodyWu.pdf). He uses reinforcement learning to continuously relearn the classifier and he also addresses the time-series aspect.

Finally, while I think that it's a lot of fun to play around with these methods I would not expect single stock methods that only look at the price to work very well as the autocorrelation of returns is basically zero.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi,

This is an awesome share, and I couldn't resist playing around with it. I definitely encourage you to keep posting these kinds of ideas!

What I did was first to use some of the built-in pandas facilities to simplify the code in the batch_transform. Feedback is welcome!

Secondly, I agree with Thomas' point that a stock's history is an unlikely predictor for its own performance. However, I was curious if one stock could prove to be a good predictor for a second stock. So I further modified the code to have one stock as an indicator, and a second stock as the actual investment instrument. The code now calculates the bayesian using the two series.

I played around with a few stocks, and discovered one pair that had a consistently sub 50% "correct" score. So, I added a flag to say that an indicator is actually a contra-indicator. This flips buys/sells and thereby does the opposite of what the signal suggests. Lastly, I added some order calls to see how well the idea backtests. From what I can tell, the predictor is above 50% except for one period in 2006, where it completely fails.

I was wondering if there is a way to detect this period of deterioration and just liquidate the portfolio until the signal is deemed trustworthy again. I was playing around with the correct score as a way of doing this, but it started to feel like pure data mining, so I stopped and took that code out. Is there any concept of confidence or something like that we could apply?

thanks again, this was a lot of fun,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks for the feedback guys!

Thomas - I agree. With the autocorrelation between stock prices being zero, it does seem unlikely this would actually work. I'll upload the Decision Tree version in the next few days.

John, glad you liked it! I think using two stocks, one as a predictor and one as to trade is quite a clever application of this. It seems strange that the signal be consistently wrong (a little counterintuitive), although, I suppose if you are using two stocks it could have something to do with that.

In regards to your confidence question, unfortunately, while Naive Bayes spits out a "probability" of the classification being one way or the other, it's generally accepted that probability is fairly meaningless. I think newer methods of machine learning, say SVMs, would do a better job at accurately giving a confidence score! I'll look into that as well.

Thanks btw for cleaning up the code, much more readable!

Hi Alex,

Finally somebody used a Bayesian approach here. Great work!

As a Bayesian myself I would suggest to use instead of the binary approach a discrete setting. Ie, a 3 ranges analysis like bellow -0.2%, above 0.2% and in between. So for each day t-1, t-2, etc. you would have 3 possibilities. Then you can calculate the historical probability of the equity going up the next given the actual set-up of t-1, t-2, ...

This strategy could be improved to a 5 ranges structure where the algo would invest 0 quantity if the highest probability is in the middle range, 1 quantity if the highest probability is in one of the intermediary range and 2 quantities if the maximal probability is in the external ranges.

If I have some (lot of) time I will try to implement this strategy.

Good luck.

Btw, I would use longer time frame. 60 days is quite small from a statistical perspective. Usually under Bayes assumption you would use all the information available.

Hi Hernan,

Thanks for the suggestions - if I have some time in the near future, I'll defiantly attempt to implement.

I'm curious though - on a higher level - do you think there's value in a Bayesian approach when looking at exclusively prices?

That is, do you think including other factors in a similar approach (volume maybe? and would be easy on Quantopian) would improve results?

Secondly, I agree with Thomas' point that a stock's history is an unlikely predictor for its own performance.

With the autocorrelation between stock prices being zero

In which case you guys are ignoring "momentum" entirely: - the best documented anomaly in finance......
And if trend followers manage to make a profit by running winners and cutting losers on a 40% win, 60% losing trade ratio, then there may actually be mileage in using ML to make predictions even if they are no more than random. Although some academic research has outlines prediction success of 53/45% using ML. Which ought to give some grounds for optimism.