Multi-armed Bandit

Back to Community

Multi-armed Bandit

edited

This notebook first introduces the exploration exploitation problem through a contrived situation where we have two arms each with an associated reward probability. In order to make an informed decision we simulate a fixed number of trials to assess which arm has the highest CTR. Rather than naively splitting trials between our arms the bandit approach attempts to find the right balance between exploration and exploitation in order to minimize regret. Following the introduction, it goes into a simulation that trades $1 each day on a single stock using LinUCB1, an algorithm that assumes rewards are independent and stochastic.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

18 responses

Christopher Ames

Sorry what's CTR in this context?

Simon Thornington

I think this is trying to apply some ad network machine learning techniques to trading.

Gideon Wulfsohn

CTR is an acronym for click-through rate - apologies for not defining it!

Disclaimer

Anthony FJ Garner

Gideon
I have had a brief look at the referenced article and understand in broad terms what the purpose is of the approach taken as regards website content although I have not begun to put in the time to understand it fully.

I have no idea whether stock prices are stochastic or not, some of the time or all of the time or none of the time. I have no idea whether stock price movement is random or deterministic.

The only thing I do know is that is appears (emphasis on the word appears) that patterns may exist in stock prices which are repeated and which may therefore enable an algorithm to discover such patterns and use them to make probabilistic bets which would profit from their recurrence.

For the unwashed such as myself would you be very kind and explain in simple terms comprehensible to the man on the Clapham Omnibus what the theory is behinge your algorithm? How does it seek to predict stock price movement (if indeed that is what is does)?

Can you tell us all a great deal more?

Gideon Wulfsohn

Hey Anthony,

The goal was simply to take an approach that I had been familiar with from a different medium and apply it to Quantopian data. What ended up happening was it proceeded to overfit to the set of noise that is day by day fluctuations in equities pricing.

I found the UCL course run by David Silver (especially this lecture) to be helpful in getting up to speed on reinforcement learning, which is the branch of machine learning that the bandit algorithm resides in.

In short, what is happening in this notebook is the LinUCB1 algorithm is incrementally learning which stock from the universe of stocks defined is most likely to produce the largest reward (which in this case only looks at the difference between open and close prices) on a given day. In turn, this leads the incremental learner to view previous fluctuation as an indicator of future success, which on the one hand could be viewed as a crummy version of momentum, but on the other hand is best seen as overfitting data that lacks signal.

If I were to continue working on this approach, it would be interesting to encode a feature vecture (using data from quantopian.com/data) to learn how fluctuation can be explained and exploited with respect to metadata that describes a given equity.

Another interesting thing to note is that rather than seeking to minimize variance as a traditional trading algorithm would, LinUCB1 rather is inclined to choose equities with a high degree of variance as a way of testing whether or not the upper confidence bound is representative of upside potential or was merely a result of not running enough trials on the given arm.

Disclaimer

Anthony FJ Garner

Gideon
Thank you for such a very clear explanation. My son is currently at UCL and going into investment banking in the City of London next year. I suspect he would be better advised to attend David Silver's lectures rather than study mediaeval European literature but you never know, life seems to work in curious ways.

In any event my strong suspicion is that this sort of approach has great promise. In particular

how fluctuation can be explained and exploited with respect to metadata that describes a given equity

although we are all aware of the dangers of drawing false conclusions from coincidental correlations. In any event, the study of price alone has been my main preoccupation over the past decade and the pursuit has become sterile. Access to fundamental data such as that provided here on Quantopian is a great step forward and my ill informed suspicion is that general economic data as well as social media data as related to price movement is a fruitful avenue.

I believe I am correct in saying that Winton and similar hedge funds are busy exploring such directions and I suspect rightly so.

Tybalt Trust

Do we get to see the code? Have I missed that part?

Kindest,

Matthew

Anthony FJ Garner

Press "view notebook" and all will be revealed. Press "clone notebook" and you will get your own copy to play with in the research environment.

This is not an area where you will get swift results. At present it is more akin to scientific endeavour than a tried and trusted investment methodology. I try never to forget that fundamentals are what drove stock prices and that earnings growth and continued corporate profitability are at bottom all that counts.

Anthony FJ Garner

A lot of what goes on at this (very, very good platform) and all lesser trading and investment platforms can best be described as "math-turbation". Naseem Tableb's excellent books Fooled by Randomness and Black Swan are an essential read in this respect.

Nonetheless I think friend Naseem has himself been "fooled". While no one in his right mind can deny the Black Swan (what wiped out the dinosaurs?) we nonetheless live in a world where decisions must be made and assumptions assumed correct until proven otherwise.

In other dull posts I have noted the cyclicality of all things. In the end we are told entropy wipes even that out and we will be left (in this universe at least) with black sterility and emptiness.

Until such time however my own suggestion to the investment community is not to lose sight of what appear to be fundamental factors driving the price of investments. Continued and growing profitability is the king of all such factors. "Continuation" is also extremely relevant in the stock market - research by Eric Crittenden shows the majority of stocks over the past century have ceased to exist and lost all value. We are fooled by basic trend following indices (the Dow, the S&P etc) into believing growth has been continuous over the past century. It has in aggregate but not in the particular.

So my own research will continue to be focussed on what I believe drives share price in the long term: business success. Universes, planets, empires, societies, species all rise and fall. The trick is to invest in them while they rise and exit when they start their fall.

That is all its about really. I believe Gideon's suggestion and the sort of methods employed in the world of big data may well help to identify such trends in growth and the following and inevitable collapse.

Andrew Kreimer

Great work!

Anthony FJ Garner

Agreed.

I have put enormous effort into machine learning recently. What I have learned is that it is more important to get a view of the process as a whole that dive into any one particular algo or ML sub-division.

The similarities between techniques are far greater than the disparities.

I believe from my general reading that HFT outfits and certain Quant HFs have hired squad loads of ML guys recently. My suspicion would be that it is all very exploratory at this stage. But perhaps certain aspects are contributing to profitability.

Anyone know the answer?

Anthony FJ Garner

algonell Are these results actual or hypothetical Andrew?

References to:

Our Secret Technology

on your website suggest that clients are not informed of the basis of recommendations made by algonell. Nor the algorithms used - even the type or class of algo. Would this be correct?

Andrew Kreimer

Thanks for your interest Anthony,

Those are demo contests, so it's all hypothetical.
We have multiple Alphas, but still evaluate them on real accounts.

One of our algorithms:
http://www.sciencedirect.com/science/article/pii/S1877050916318816
I'm about to publish another paper that describes briefly what we actually do.

MAB is extremely interesting, I've been looking for application of it in finance, and thanks to Gideon we have a cool one :)

Anthony FJ Garner

I'm about to publish another paper that describes briefly what we actually do.

Many thanks I will look out for this with interest.

Anthony FJ Garner

Loving the David Silver lectures. Thanks Gideon - a good find

Deleted User

Wow. This is the first time I have seen someone apply reinforcement learning on quantopian. I look forward to more of your work.

Anthony FJ Garner

I think RL is vastly exciting.

Deleted User

@Anthony. Yes. I recently attempted the 2 sigma challenge on kaggle. Learnt a lot through it and by going thru Silver's lectures on youtube. I want to implement a policy gradient approach with sharpe ratio as reward function and try it on quantopian.

You've successfully submitted a support ticket.

Our support team will be in touch soon.