Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
How to Leverage the Pipeline to Conduct Machine Learning in the IDE

Hey Everybody,

We all know the Quantopian platform is incredibly powerful, but for those of us who have tried to coax it into backtesting machine learning algorithms on anything other than OHLCV data, we also know that the platform isn't as flexible with fundamental data, since Morningstar Fundamentals and other datasets can't be accessed through the data.history object. When working with fundamental factors, it's straightforward to use the research environment to tune a static set of parameters for a machine learning algorithm which you then implement in the Backtesting IDE, but it's not immediately clear how to implement a machine learning algorithm in the IDE that retrains and updates its parameters over time.

A few months back, I spent a good deal of time digging through the documentation, trawling the forum, hacking away at this, and I ultimately found a way to make it work. Despite likely giving a way a bit of a trading edge, I wanted to share the skeleton of my algo from back then to save a lot of you the headaches of trying to stand something like this up.

A lot of this will repeat from Thomas Wiecki's awesome post on this material, but, at just over 100 lines of code, I hope this notebook will be a little more digestible for folks than the very robust algorithm in that thread.

Note that the performance metrics in the below algorithm aren't too relevant-- if you pick up this algorithm for your own use, you'll likely have a more intentional and sensible portfolio allocation strategy. Plus, the choice of simple OLS for the machine learning algorithm, as well as the choice of these particular factors, are just placeholders-- you'll likely select both fundamental inputs and a machine learning algorithm which make sense.

Good luck!

EDIT: Bug fix-- thanks to Kelvin Ho for the catch.

4 responses

Hi Jim,

This "predict" CustomFactor is very useful, thank you very much.

However, I think there is a minor bug caused by a line below because np.mean(x) and np.std(x) will return a scalar value instead of a vector.

x = (x - np.mean(x))/np.std(x) # demean and normalize

We possibly better replace the line by two lines below to do standardization by individual fundamental factor one by one

scaler = preprocessing.StandardScaler().fit(x)

x = scaler.transform(x)

All the best,
Kelvin

Hi Kelvin-- Good catch! I've edited the initial post to fix the scalar reduction.

Thanks!

Hi Jim

I used you great algorithm to create for my ML students a "Naive Bayes High Low Return Prediction Algorithm" that a sort of ML Simple Mean Reversion algorithm. Do you know how can we include:

  • Calculated factors of the type "asset_to_equity_ratio = ( Fundamentals.total_assets.latest / Fundamentals.common_stock_equity.latest) "; and
  • Custom factors
    in your algo. We tried unsuccessfully.

Hi German,

Great to see someone picking up the code and using it for ML education. Unfortunately, it's been a while since I've touched this and don't know how to implement custom factors here. I think I'm gonna have to pass you off to the Quantopian Staff.

  • Jim