Hi Everyone!
This is my first post on the forum and I hope that you find it a useful contribution.
Lately I have been playing around with some of ideas from Marcos Lopez de Prado's latest book, Advances in Financial Machine Learning, in particular the idea around meta labeling. For one, if meta labeling works then it has tons of great applications across all machine learning projects.
I attached a notebook that is a small MVP of meta labeling, applied to a much simpler problem, MNIST data. If the idea works well then the plan is to apply the method to other projects.
The central idea is to create a secondary ML model that learns how to use the primary exogenous model. This leads to improved performance metrics, including: Accuracy, Precision, Recall, and F1-Score.
To illustrate the concept I made use of the MNIST data set to train a binary classifier on identifying the number 8, from a set that only includes the digits 8 and 3. The reason for this is that the number 3 looks very similar to 8 and I expect there to be some overlap in the data, i.e. the data are not linearly separable. Another reason I chose the MNIST dataset to illustrate the concept, is that MNIST is a solved problem and we can witness improvements in performance metrics with ease.
My main question is this: What exactly is meta labeling doing? I've bounced a few ideas around but I haven't got a clear answer (I'll share some of them as we progress). I'd love to hear some feedback from the community and create a conversation.
I have been sure to include a few of the important papers that were mentioned in the Bibliography, that highlight some of the inspiration for meta labeling.
- Patel, J., Shah, S., Thakkar, P. and Kotecha, K., 2015. Predicting stock market index using fusion of machine learning techniques. Expert Systems with Applications, 42(4), pp.2162-2172.
- Tsai, C.F. and Wang, S.P., 2009, March. Stock price forecasting by hybrid machine learning techniques. In Proceedings of the International MultiConference of Engineers and Computer Scientists (Vol. 1, No. 755, p. 60).
- Zhu, M., Philpotts, D., Sparks, R. and Stevenson, M.J., 2011. A hybrid approach to combining CART and logistic regression for stock ranking. Journal of Portfolio Management, 38(1), p.100.
My conclusion is that meta labeling works as advertised. You will see that in the confusion matrix, that the FP from the primary model, are now being correctly identified as TN with the help of meta labeling.