Simple ML demo to port to QSTK

Back to Community

Dan Bikle

posted Nov 24, 2014

This is a simple ML demo I want to port to QSTK.

Dan

9 responses

Ning Liu

Nov 24, 2014

So I'm assuming SPY is the SP 500 ETF Trust. Is that right?
Why does everybody use it as a benchmark? Instead of S&P 500 Stock Index?

Dan Bikle

Nov 25, 2014

Yes SPY is that trust.

I like SPY because it is easier to spell than ^GSPC

Also I think SPY is better because you can actually buy/sell it.

For real world I trade ES-mini because commission is low its liquid and it makes my schedule D tax form thinner.

Ning Liu

Nov 25, 2014

Thank you for the answer, Dan!

James Jack

Nov 25, 2014

Very interesting!

Why did you choose KNeighborsClassifier?

What is the Kelly bit doing, exactly?

Dan Bikle

Nov 25, 2014

@jj, My preference has been logistic regression.

But then I bumped into KNN while surfing through some links I found at the Quantopian github page.

So, I backtested KNN on some ^GSPC yahoo prices going back to 1950 and found that KNN offers a tiny edge over LR.

Also I like KNN because it is easy to describe how it works.

LR internals are described well by Ng in his coursera videos but that is an investment of at least 30 minutes of your time.

But, KNN can be understood just by surfing its wikipedia page and thinking about it for 5 minutes.

I offer 2 Kelly params.

ct.kelly_base is a general aggressiveness dial.

ct.kelly_base == 1 means act normal.

ct.kelly_base > 1 means act aggressive.

ct.kelly_x quantifies Kelly's idea that if you have an edge, you should bet more.

ct.kelly_x == 0 means dont try to exploit the edge.

ct.kelly_x > 0 means try to exploit the edge.

The attached backtest (spy256) will allow you to see the effect of both params where I set

ct.kelly_base = 2
ct.kelly_x = 2

The first backtest (spy255) had this:

ct.kelly_base = 1
ct.kelly_x = 0

As far as actually using these algos I'd advise against it.

The period between 2006 and 2009 would cause most followers to abandon ship.

Dan

Matthias Waldhauer

Nov 26, 2014

The performance looks nice, but the actual number of zero crossings and changes in the kelly value means, that we have a sample size (kelly "events"), which is too small to build enough trust in it.

Could it be tuned to give more kelly based decisions, so we might see, how those decision fare?

ML not only means to apply ML methods, but also to know, when they actually do work and when we only see a lucky shot.

Dan Bikle

Nov 26, 2014

Yes, that is why I want to move it to QSTK so I can study it more.

A simple way to get more crossings is to reduce the number of in-sample rows.

If I did that, a proper thing to do from an ML perspective is to reduce the number of features to avoid over-fitting.

My rule of thumb is that N-features needs 10^N in-sample rows.
You can see from the source code I have 6 features so I should have 10^6 rows.
Instead I feed it 10^3 so I'm sliding down a slippery slope towards a garbage-algo.

Another thing to try is to synthesize features from talib.

This is more like middle school chem-class than quantified finance.

Dan

Nyan Paing Tin

Nov 26, 2014

Hey Dan,

Thank you for sharing and this is great. Would you be kind enough to explain more about this ? I am a little confused while reading the code.. sorry for being dumb.

sonic sun

May 2, 2016

@ Nyan Paing Tin

The code tries to use the knn to cluster the price trend into 2 clusters.
It tries to collect 1400 rounds of the data into the knn classifier and then use it to predict the probability of the current trend of the price. If it is a rising trend, and the possibility is 0.1, it will utilize the kelly formula to make a position.

If you don't know what is knn, you can look into this link https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

You've successfully submitted a support ticket.

Our support team will be in touch soon.