Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Predicting Winning Sector ETF using Machine Learning SVM

Hi All,

This algo tries to predict the upcoming best performing sector ETF using past performance. The training data for the learner is based on past sharpe performance of individual sector, ranked cross-sectionaly. i.e. The features at month M will be the ranked sharpe ratios of the sector at month M-1, M-2, .. (to a arbitrary amount of lags). The target variable (y) is the best performing sector at month M. As such, I am trying to see if a classifier can learn from past relative performance of sectors. Once the training is done, the following month prediction is used and vested accordingly.

Of course, this results in a beta =~ 1.00 strategy and the algo barely generates a little bit of alpha. I am looking for suggestion on how this could be improved.

7 responses

I've always been kind of skeptical of pure lagged past returns as predictive of future returns. Have you tried adding other factors? Technicals are fast moving, or just previous quarters/years actuals or estimates? Those together might be more informative.

@luc prieur,
interesting approach.

I made some changes to make the algo stable and profitable Alpha 0.09, Beta 0.38.

Run it on month_start()

assets = symbols('QQQ', 'SPY', 'XLP', 'XLK', 'XLY', 'XLB', 'XLI', 'XLV', 'XLU', 'TLT')  

replaced 'XLE' by 'QQQ', 'XLF' by 'TLT'

sharpes =  1.0/(rets.rolling(window=FREQ).std())  

It is not really sharpe ratio but inverse volatility, removed annualization factor.

sharpes = sharpes.resample(SAMPLE).last()  

replaced mean() by last()

I tried unsuccessfully on a smaller list.

Can you help me to run this algo for symbols:

'QQQ','XLP','FDN','TLT','IEF'?

@Valdimir
The problem with your set of symbols is that IEF always has the best value in the beginning of the test period. So there is only one unique label and nothing to learn and predict for the model. These lines check the number of unique labels and in case there is only one, return that symbol:

    unique_labels = y.unique()

    if len(unique_labels) == 1:  
        asset = lookup_dict[unique_labels[0]]  
        print('only one label: {}, no prediction'.format(asset))  
        context.prediction = 0

        return asset

the plot shows if there is a prediction (1) or not (0)

Thanks Vladimir, Tentor, Kyle,

Vlad: I agree with your changes. I did use sharpe and zscore in my development but found that 1/vol is often easier to work with and decided to using cross ranking rather then zscore on same.

Tentor: nice input. There is often two things to check: that multiple classes are found in the training data, and that the predictions are not all trivial, i.e. always outputting "TLT' or something like that. The later would potentially point to an underfitting model.

With your inputs in mind, I will work on the classifier providing probability on its guesses or use a regressor such that the portfolio constructed contains more than a single ETF.

Kyle, simple question to your input: do factors exists for ETFs? If so, can you point me to some info on that?

/L

Thanks Tentor, luc

I have managed to create sharpes factor more or less similar to sharpe ratio

    ann_rets = ((rets.rolling(window=FREQ).mean() + 1)**252 - 1.0)  
    ann_vol = (rets.rolling(window=FREQ).std()*252**0.5)  
    sharpes = ann_rets / ann_vol  

but it has less predictive power then inverce volatility so I left

    sharpes = 1.0 / ann_vol  

here is a backtest just on three symbols:'QQQ', 'XLP', 'TLT'
Alpha 0.11, Beta 0.16, Max Drawdown -0.20

Less participants more oxygen.

Nice work Vlad.

Here is an update that uses "predict_proba". So instead of the model picking the winning asset to purchase, it returns the probabilities which are in turn used as weights.

Here is the same code but with just TLT, XLP and QQQ.

So, I guess that using the predicted probability is not as efficient...