Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Simplest Machine Learning With KNN, Benchmark QQQ

This algorithm uses very simple machine learning technique KNN as regressor and tries to predict 3 days return based on look back historical data.
It selects stocks in QQQ holdings and target 3 days return. For each stock, load historical price data from 2013-2016 and generate a dataframe containing 1,2,3,5,10 days adj.close price change.

    df['T1'] = df['price'] / df['price'].shift(1)  # 1 day change  
    df['T2'] = df['price'] / df['price'].shift(2)  # 2  
    df['T3'] = df['price'] / df['price'].shift(3)  # 3  
    df['T5'] = df['price'] / df['price'].shift(5)  # 5  
    df['T10'] = df['price'] / df['price'].shift(10)  # 5  

Next, add a column (Y) containing 3 days returns to this dataframe (Look ahead).
df['y'] = df['price'].shift(-target_days) / df['price'] # N days return from today For each symbol create this and build a regressor with this dataframe as training data.
When backtesting, everyday calculate 1,2,3,5,10 change after market open for each stock and make prediction based on this data point [T1,T2,T3,T5,T10]. If the (y=)prediction >1% buy and hold this stock for 3 days, if the prediction drop below certain threshold in any of the 3 days, sell it.
Again this is a very simple algorithm but I think it is a good starting point or maybe a good weak learner to build more complex models. When backtesting, you should ensure training data period does not intersect backtesting period to avoid look ahead bias.
For KNN model, number of neighbors and distance metric are two parameters that require fine tuning, and I feel that chebyshev distance is almost always better than euclidean distance when it comes to time series.
This is actually my first algorithm so please feel free to point out if there is any error. Thanks!

1 response

Here is another algorithm based on K nearest neighbors that uses a number of technical indicators as features in addition to price difference lags. The KNN regressor is trained weekly to predict the next day's return. The sign of the prediction is used to construct a long only portfolio of pre-selected funds. The algorithm is traded daily with a prediction threshold that attempts to reduce the number of transactions.