Meta-Labeling: Advances in Financial Machine Learning, Ch 3, pg 50.

Hi Everyone!

This is my first post on the forum and I hope that you find it a useful contribution.

Lately I have been playing around with some of ideas from Marcos Lopez de Prado's latest book, Advances in Financial Machine Learning, in particular the idea around meta labeling. For one, if meta labeling works then it has tons of great applications across all machine learning projects.

I attached a notebook that is a small MVP of meta labeling, applied to a much simpler problem, MNIST data. If the idea works well then the plan is to apply the method to other projects.

The central idea is to create a secondary ML model that learns how to use the primary exogenous model. This leads to improved performance metrics, including: Accuracy, Precision, Recall, and F1-Score.

To illustrate the concept I made use of the MNIST data set to train a binary classifier on identifying the number 8, from a set that only includes the digits 8 and 3. The reason for this is that the number 3 looks very similar to 8 and I expect there to be some overlap in the data, i.e. the data are not linearly separable. Another reason I chose the MNIST dataset to illustrate the concept, is that MNIST is a solved problem and we can witness improvements in performance metrics with ease.

My main question is this: What exactly is meta labeling doing? I've bounced a few ideas around but I haven't got a clear answer (I'll share some of them as we progress). I'd love to hear some feedback from the community and create a conversation.

I have been sure to include a few of the important papers that were mentioned in the Bibliography, that highlight some of the inspiration for meta labeling.

Patel, J., Shah, S., Thakkar, P. and Kotecha, K., 2015. Predicting stock market index using fusion of machine learning techniques. Expert Systems with Applications, 42(4), pp.2162-2172.
Tsai, C.F. and Wang, S.P., 2009, March. Stock price forecasting by hybrid machine learning techniques. In Proceedings of the International MultiConference of Engineers and Computer Scientists (Vol. 1, No. 755, p. 60).
Zhu, M., Philpotts, D., Sparks, R. and Stevenson, M.J., 2011. A hybrid approach to combining CART and logistic regression for stock ranking. Journal of Portfolio Management, 38(1), p.100.

My conclusion is that meta labeling works as advertised. You will see that in the confusion matrix, that the FP from the primary model, are now being correctly identified as TN with the help of meta labeling.

Thanks for the great notebook and explanation @Jaques.

I have a question that I haven't been able to ascertain from the book. When training the primary (exogenous) model to find trading opportunities, we want to maximise recall (I see you do this by choosing a probability threshold based on the ROC curve which yields high recall). However, Lopez does not say what metric should be maximised in training the secondary (meta) model. He says Meta-labeling will increase your F1-score by filtering out the false positives, where the majority of positives have already been identified by the primary model which may be true once you combine both models, but it does not indicate how we should train the second model.

What metric do you advise in maximising / minimising for the meta model? My first guess would be minimising False Positive Rate (False Positive / (False Positve + True Negative)).