Predicting Winning Sector ETF using Machine Learning SVM

Back to Community

posted

Hi All,

This algo tries to predict the upcoming best performing sector ETF using past performance. The training data for the learner is based on past sharpe performance of individual sector, ranked cross-sectionaly. i.e. The features at month M will be the ranked sharpe ratios of the sector at month M-1, M-2, .. (to a arbitrary amount of lags). The target variable (y) is the best performing sector at month M. As such, I am trying to see if a classifier can learn from past relative performance of sectors. Once the training is done, the following month prediction is used and vested accordingly.

Of course, this results in a beta =~ 1.00 strategy and the algo barely generates a little bit of alpha. I am looking for suggestion on how this could be improved.

7 responses

Kyle M

I've always been kind of skeptical of pure lagged past returns as predictive of future returns. Have you tried adding other factors? Technicals are fast moving, or just previous quarters/years actuals or estimates? Those together might be more informative.

Vladimir

@luc prieur,
interesting approach.

I made some changes to make the algo stable and profitable Alpha 0.09, Beta 0.38.

Run it on month_start()

assets = symbols('QQQ', 'SPY', 'XLP', 'XLK', 'XLY', 'XLB', 'XLI', 'XLV', 'XLU', 'TLT')

replaced 'XLE' by 'QQQ', 'XLF' by 'TLT'

sharpes =  1.0/(rets.rolling(window=FREQ).std())

It is not really sharpe ratio but inverse volatility, removed annualization factor.

sharpes = sharpes.resample(SAMPLE).last()

replaced mean() by last()

I tried unsuccessfully on a smaller list.

Can you help me to run this algo for symbols:

'QQQ','XLP','FDN','TLT','IEF'?

Tentor Testivis

@Valdimir
The problem with your set of symbols is that IEF always has the best value in the beginning of the test period. So there is only one unique label and nothing to learn and predict for the model. These lines check the number of unique labels and in case there is only one, return that symbol:

    unique_labels = y.unique()

    if len(unique_labels) == 1:  
        asset = lookup_dict[unique_labels[0]]  
        print('only one label: {}, no prediction'.format(asset))  
        context.prediction = 0

        return asset

the plot shows if there is a prediction (1) or not (0)

luc prieur

Thanks Vladimir, Tentor, Kyle,

Vlad: I agree with your changes. I did use sharpe and zscore in my development but found that 1/vol is often easier to work with and decided to using cross ranking rather then zscore on same.

Tentor: nice input. There is often two things to check: that multiple classes are found in the training data, and that the predictions are not all trivial, i.e. always outputting "TLT' or something like that. The later would potentially point to an underfitting model.

With your inputs in mind, I will work on the classifier providing probability on its guesses or use a regressor such that the portfolio constructed contains more than a single ETF.

Kyle, simple question to your input: do factors exists for ETFs? If so, can you point me to some info on that?

Vladimir

Thanks Tentor, luc

I have managed to create sharpes factor more or less similar to sharpe ratio

    ann_rets = ((rets.rolling(window=FREQ).mean() + 1)**252 - 1.0)  
    ann_vol = (rets.rolling(window=FREQ).std()*252**0.5)  
    sharpes = ann_rets / ann_vol

but it has less predictive power then inverce volatility so I left

    sharpes = 1.0 / ann_vol

here is a backtest just on three symbols:'QQQ', 'XLP', 'TLT'
Alpha 0.11, Beta 0.16, Max Drawdown -0.20

Less participants more oxygen.

luc prieur

Nice work Vlad.

Here is an update that uses "predict_proba". So instead of the model picking the winning asset to purchase, it returns the probabilities which are in turn used as weights.

luc prieur

Here is the same code but with just TLT, XLP and QQQ.

So, I guess that using the predicted probability is not as efficient...

You've successfully submitted a support ticket.

Our support team will be in touch soon.