Nice ideas and a good base for further playing! I haven't been active for a while so I am not sure if my observations are correct, but maybe helpful for you guys:
(1) I think the prediction of the price change of the current day (y, i+context.lookback) is based on price changes which include the current day (x, i+context.lookback).
# For each day in our history
for i in range(context.history_range-context.lookback-1):
X.append(price_changes[i:i+context.lookback]) # Store prior price changes including the day's price change
Y.append(price_changes[i+context.lookback]) # Store the day's price change
Would Y.append(price_changes[i+context.lookback+1]) work better? Or even more days and not only the next day?
(2) In the last example, a prediction is made for the sum of price and volume changes, however it would be sufficient to know if the price is going up or down, so Y could be shortened.
#Old
Y.append(price_changes[i+context.lookback] + volume_changes[i+context.lookback]) # Store the day's volume change
#New
Y.append(price_changes[i+context.lookback+1]) # Store the day's price change only
Also, a little swap happened in the variable names:
rfr = GradientBoostingRegressor(learning_rate = 0.1, n_estimators = 150)
gbr = RandomForestRegressor()
(3) GBR can give better results if parameters (e.g. learning curve) are slightly adopted, e.g.
gbr = GradientBoostingRegressor(learning_rate = 0.01, n_estimators = 150, max_depth = 4, min_samples_split = 2)
(4) Mean forecast error can be calculated without negative impact by fitting twice
# Generate our models
rfr = RandomForestRegressor()
gbr = GradientBoostingRegressor(learning_rate = 0.01, n_estimators = 150, max_depth = 4, min_samples_split = 2)
# Test our models on independent test data
offset = int(len(X) * 0.8)
X_train, Y_train = X[:offset], Y[:offset]
X_test, Y_test = X[offset:], Y[offset:]
rfr.fit(X_train, Y_train)
rfr_me = math.sqrt(mean_squared_error(Y_test, rfr.predict(X_test)))
context.rfr_me[idx] = rfr_me
gbr.fit(X_train, Y_train)
gbr_me = math.sqrt(mean_squared_error(Y_test, gbr.predict(X_test)))
context.gbr_me[idx] = gbr_me
# Fit our models with all data
rfr.fit(X, Y)
gbr.fit(X, Y)
and recording the ratio can show that GBR is slightly better:
#record(mean_error_rfr = context.rfr_me[idx])
#record(mean_error_gbr = context.gbr_me[idx])
record(mean_error_gbr_rfr_ratio = context.gbr_me[idx]/context.rfr_me[idx])
(5) Finally, it seems like training is done with closing prices, but trading is done including the opening price, as the schedule_function starts the trading every morning and:
recent_prices = data.history(security, 'price', context.lookback+1, '1d').values
Maybe training should also include open price in every last data point.
Have fun,
Frank