If I want to train a supervised machine learning algorithm to predict win/losses of a stock based on some fundamental data at the time a position is entered, I need a good training set. To construct the training set, feature vectors of fundamental data have to be mapped to discrete or continous win/loss target values. Now the question is: What is a good strategy to construct such a dataset? Say I take a set of fundamental data of a universe of stocks as feature vector, to which target values do I map these feature vectors? Or more precisely: Do I take the close of the next trading day, or the close of next week / month etc. as target values, or an average of all these?
Or in the pseudocode
for each day in history
for each stock in universe
feature_vector = CalculateFeatureVector(day, stock)
target_calue = CalculateTargetValue(day,stock)
AddInstanceToTrainingSet(feature_vector, target_value)
I am looking for good strategies to implement
CalculateTargetValue(day,stock)
I am sure this is fairly gerenal question, with answers that start with "It depends..." . But what I am looking for here, are some rules, strategies best practises and/or examples that may help to construct supervised training sets.