Perceptron is still a bit too simplistic in that it traverses the data set in order and treat every misclassified point equally. SGD method using hinge loss function can do a better job: it randomly chooses training data, gradually decrease the learning rate, and penalize data points which deviate significantly from what's predicted. Here I used an average SGD method that is tested to outperform if I simply pick the last predictor value trained after certain iterations. All other ideas are pretty similar to my previous post on perceptron.
It turned out that randomness increases volatility but also increases highest gain. I would love to hear comment and other experiments on how to improve this idea. Note again that it took me a very long time to run backtests and debug the algo simply because the computation requirements are very high. Any optimization suggestions are also welcomed. Thank you!
Remember to just click the "clone" button if you want to make a copy to try and edit yourself.