Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
multi-factor algo template

I've attempted to replicate the Pipeline factors used in the ML algo posted here:

https://www.quantopian.com/posts/machine-learning-on-quantopian-part-3-building-an-algorithm

Note also that the ML algo is now on Github:

https://github.com/quantopian/research_public/blob/92b32ccd61f25fdfbccfc67a82217c64ae3173e3/research/ml_algo.py

The end goal is to re-factor the ML algo, so that various alpha combination techniques can be applied easily and compared. This is a first step in that direction.

One specific snag is that I have been unable to use the built-in MACDSignal in a custom factor. So, if anyone knows how to do it, it would be much appreciated (my attempt is commented out...if you run it, you'll see the error).

14 responses

Some contributions, logging preview of data (would also count nans), little different normalization, nanfill(), MACD and show_opt_weights. Currently on TargetWeights instead of MaximizeAlpha just seeing what happens. Only to '08 here.

2005-01-04 05:45 log_pipe:277 INFO Rows: 400  Columns: 7  
                                   min                 mean                max  
 Capex to Cashflows     -3.17832165546     -0.0609523660925      3.08108326485  
     EBIT to Assets     -3.18221183985       -0.14835647347      2.39685051613  
               MACD     -4.13792045044     -0.0941518487729     0.762728988302  
Moneyflow Volume 5D     -1.79092275867       -0.12176130157      2.17061364146  
      Volatility 3m     -1.30869878045       0.436766827585      2.85606751559  
               beta     0.236772337938        1.41296277251      3.28745561795  
     combined_alpha     -19.5841117298       0.264473410066      21.3321623585  
2005-01-04 05:45 log_pipe:292 INFO _ _ _   Capex to Cashflows   _ _ _  
    ... Capex to Cashflows highs  
                      Capex to Cashflows  EBIT to Assets      MACD  \  
Equity(21110 [APCS])            3.081083       -0.195858 -0.042299  
Equity(16511 [KMX])             3.081083        0.741732  0.061259  
Equity(15129 [FDS])             3.081083        2.396851  0.569937  
Equity(3706 [HTLD])             3.081083        0.953252  0.081986

                      Moneyflow Volume 5D  Volatility 3m      beta  \  
Equity(21110 [APCS])             0.683083       0.804737  1.375408  
Equity(16511 [KMX])              0.165734       0.367713  1.493836  
Equity(15129 [FDS])             -0.057370      -0.319507  1.081853  
Equity(3706 [HTLD])              0.185720       0.234778  1.422592

......

show_opt_weights() can be filtered for sids.

2005-07-11 07:30 show_opt_weights:322 INFO Close  
2005-07-11 07:30 show_opt_weights:329 INFO 0.00140 => 0  ACI  
2005-07-11 07:30 show_opt_weights:329 INFO 0.00178 => 0  TVTY  
2005-07-11 07:30 show_opt_weights:329 INFO -0.00144 => 0  BCR  
2005-07-11 07:30 show_opt_weights:329 INFO 0.00270 => 0  BPOP  
2005-07-11 07:30 show_opt_weights:331 INFO     40 more  
2005-07-11 07:30 show_opt_weights:340 INFO Open  
2005-07-11 07:30 show_opt_weights:347 INFO 0 => 0.00155  AES  
2005-07-11 07:30 show_opt_weights:347 INFO 0 => 0.00277  AIG  
2005-07-11 07:30 show_opt_weights:347 INFO 0 => -0.00206  ARRO  
2005-07-11 07:30 show_opt_weights:347 INFO 0 => -0.00275  ATW  
2005-07-11 07:30 show_opt_weights:349 INFO     49 more  
2005-07-11 07:30 show_opt_weights:360 INFO Change  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005070 => -0.005000  VRX  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005060 => -0.005000  PRGO  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005148 => -0.005000  KERX  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005272 => -0.005000  RCPI  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005399 => -0.005000  ENCY  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005351 => -0.005000  WYNN  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005147 => -0.005000  CYBX  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005059 => -0.005000  SSRM  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005142 => -0.005000  ICOS  

Thanks I’ll take a look when I get the chance.

@Grant Catching up a bit, what is the purpose of preprocess? Can you give an example?

I think you pasted the wrong link? :)

Sorry about that. I corrected the link above, and here it is:

https://www.quantopian.com/posts/alpha-combination-via-clustering#5d0c93ccdcf6b7004165d874

Thanks.

Regarding the framework itself, it seems it would be beneficial to have:

  • different weights to different factors
  • ability to have different factors for long and short portion

I have not seen frameworks generally do this.

Thanks Vladimir,

Having different weights for different factors is straightforward, and I've done it. One way is to return a tuple in the make_features() function above, which contains both the factor and its weight.

Long-only and short-only (or biased long or short) factors would take some thinking on how to do the normalization prior to combination. If you know how to do this, please share your technique.

I've implemented it outside quantopian contest algo structure (which is pretty easy) but not within it which is why I was asking.

Doing a simple .rank() for the long factor, and a negative .rank() for the short factor, and then combine them might work?

Kind of an interesting topic that I hadn't considered. I suppose that if one could find enough long and short factors, they could be combined, and would net out to no exposure, without de-meaning the alpha vectors prior to combination.

hi Grant,

in the unlikely event (but quite likely esp for fundamental data) that there is an outlier, say outlier=70000 (unit), and the true mean is 1(unit)..... "a = np.nan_to_num((a-np.nanmean(a))) " might result in np.nanmean(a) =say 25 (unit) as opposed to 1(unit, being the true mean), then "a-25" pushes most of the values to say -24 (unit). This is fine so far (as we will scale things again at the end), however, we then replace NaN with 0 by method of np.nan_to_num, so the NaN names become good alpha names (right of the distribution). These NaN names are more likely than not going to stay as good alpha numbers in the remaining of the preprocessing, despite the subsequent winzorization and final scaling.

i guess ideally the winsorization should be done before the first normalization, so that the first normalised numbers are more likely centred around zero, then it is fine to assign zero to NaN names. Do note, that scipy's winsorize method doesn't work well with NaNs in the array, so your current method of replacing NaN with zero is right, in a sense that it ensures the next line "winsorize" will work.

actually the best way to do the whole processing is to ignore NaN, then winsorize, then normalize, then throw the NaN back into the distribution as zero.

Thanks Zicai Feng -

You raise a good point! In the limit of a large outliers that skew the distribution, the NaNs don't end up where one would want them: zero, where zero means the factor predicts neither long nor short. As you have pointed out, with the code I shared above, the NaNs effectively inherit the skew, which is not desirable.

Should I get back into coding on Q, I'll have to fix this little problem.

Here's a potential fix for the problem pointed out by Zicai Feng above:

from sklearn import preprocessing  
WIN_LIMIT = 0.01

def preprocess(a):  
    a = a.astype(np.float64)  
    a[np.isinf(a)] = np.nan  
    not_nan_ind = np.argwhere(~np.isnan(a))  
    if not_nan_ind.size > 0:  
        a_win = winsorize(a[not_nan_ind], limits=[WIN_LIMIT,WIN_LIMIT])  
        a[not_nan_ind] = a_win  
    else:  
        a = winsorize(a, limits=[WIN_LIMIT,WIN_LIMIT])  
    a = np.nan_to_num(a - np.nanmean(a))  
    return preprocessing.scale(a)  

Probably could be improved, if anyone wants to give it a shot.