Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Machine Learning Models And Estimators

First off: THIS IS AN EDITED (albeit rather heavily) VERSION OF: Simple Machine Learning Example Mk II By: Gus Gordon

I decided that this needed own post since the algorithm doesn't really resemble the original anymore in terms of its methodology.

The main point of this post is to outline something I haven't seen anyone else outline:

If you want repeatable, consistent, results, you MUST use:

context.model.n_estimators = 100 #or more, anything past 250 will drastically slow down the backtest and wont be too beneficial  

Attached is an algorithm that implements n_estimators and only trades SPXL and XIV.

For further increases in accuracy and consistency:

context.model.min_samples_leaf = 2 #or more  
8 responses

Data From Backtest

Hey Jacob, when you run this say 12/01/2010 it has a massive drawdown of about 80% but then it takes off, is this 1st year a learning period? Or just doesn't perform well during 2010?

yeah, before 2012 this doesn't do well:

you may notice, this algorithm actually models Intel - then trades a SPY 3X ETF and XIV. this will work well if you model SPY, but there are points wherein XIV has dropped drastically - these are avoided if you model over INTC instead of SPY.

Hello Jacob,

Can you elaborate more on why INTC was you choice of modelling?

Since I am not sure what some of the methods are doing it is not surprising that I am confused, however, there is one thing I am truly baffled by. Why, when I "Build Algorithm" the unmodified algo "n" times in a row, do I get "n" different results?

Is there a random function/method I am missing?

Hi Paul, most machine learning algorithms use random numbers. If you want to make your work reproducible, you need to set a seed for the random state. In sklearn you can use for example the following:
random_state = 0
where is 0 the seed

BTW, extratrees = randomized decision trees

I found that by increasing the value of context.model.min_samples_leaf, also improves predictability as you state above.

My idea on how to use what you provided is to set up 4-10 diverse pairs of equities (preferably etfs) that trade inversely such as SPY & XIV and create predictors for both. Then, at the pair level, trigger buy/sell quantities of each based on their individual as well as combined "predictions."

If I can teach myself enough python, I would also like to have it self correcting, by looking at the previous predictions, and the performance based on the previous predictions and then self tweak the predictor thresholds.

For fun, I began to implement what I suggested above. Instead of just SPXL vs XIV, I included QQQ vs VIG.

The returns aren't as astronomical as the original, but there is a bit of diversification.

Since I do not know how the ExtraTreesRegressor works, and I'm not sure I would understand more than a high level description, I wonder how the prediction decision points of .4, .01 and -0.00 were determined. Of course the reasons I am interested are;

A) I have no idea whether they should be different for different pairs

B) I hate literal decision points that can't fluctuate based on data.

So the real question in my mind is can the decision points be adjusted either from;

1) Looking back at historical data using built in functions.

2) Gathering predictions, and associated P&L per security, and tweaking it as we make mistakes & learn.