Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
alpha combination via clustering

Here's a snapshot of an algo I've been working on that uses clustering to determine the alpha factor weights (if I've done things correctly). The basic weighting scheme is that if a cluster has N alpha factors in it, each factor in the cluster gets a weight of 1/N. The M factors are combined in a simple weighted sum:

alpha_combined = w_0*alpha_0 + w_1*alpha_1 + w_2*alpha_2 + ... + w_M*alpha_M  

Feedback and improvements welcome. One issue I encountered is memory limitations, so if there is guidance in this area, it would be most appreciated. I'd also be interested in hearing from machine learning (ML) subject-matter experts on potential improvements.

Note that the number of clusters is presently set arbitrarily at 3 here:

clustering = SpectralClustering(n_clusters=3,assign_labels="discretize",random_state=0).fit(alphas_flattened)  

There should probably be a global "control" set up, N_CLUSTERS, versus having the parameter buried in the code.

Note that the algo meets all of the contest criteria, so feel free to submit it just for for yucks (eventually I'll do the same, should I land on something worthy of the contest/fund).

109 responses

Very cool and generous of you Grant - thank you!

Here's an update. I added some smoothing of the final alpha vector after the factor weighting by clustering.

Hey Grant,

Very impressive!! Your clean and elegant code makes it pretty easy to follow along as well, which I'm grateful for. I'm learning a lot from it!

This second algo appears to get better absolute returns as well as higher average sharpe than the first one. However, how it got there appears to be a bit bumpier (just eye-balling it), despite the higher average sharpe. I'd be curious to hear your thoughts on it.

I'm also curious if you've done any OOS testing yet, or if you're saving that till later when you're 'done' developing it (how do you determine this?)? It looks like you might have held back the data before 2010 possibly for OOS testing at the end?

Lastly, I think I have a general idea of what you're doing in the below two sets of code (the second set in before_trading_starts), but I'd be super grateful if you'd be willing to explain it a bit in simple terms, as if you're explaining it to a 5-year-old (my niece wants to know ;)). Especially the alpha smoothing, and why it appears you're normalizing twice? Are you filling NaNs with 0, and if so, how come you chose to forward fill with 0 and not the mean value, or drop them altogether?

TAU = 5  
ALPHA_SMOOTH = 1-np.exp(-1.0/TAU)  
    combined_alpha = normalize(combined_alpha)  
    context.combined_alpha = (1-ALPHA_SMOOTH)*context.combined_alpha  
    context.combined_alpha = context.combined_alpha.add(ALPHA_SMOOTH*combined_alpha,fill_value=0).dropna()  
    context.combined_alpha = normalize(context.combined_alpha)  

Thanks and keep up the great work!

Hi Joakim -

Thanks for the compliments. There's no guarantee I'm not doing something completely knuckle-headed or pointless, so use the code with suspicion.

This second algo appears to get better absolute returns as well as higher average sharpe than the first one. However, how it got there appears to be a bit bumpier (just eye-balling it), despite the higher average sharpe. I'd be curious to hear your thoughts on it.

Not sure why there are subtle differences between the two algos. I'm mainly trying to work out the architecture (including understanding code execution time and memory limitations) and to develop some sensible "knobs" for the alpha combination step. I suspect that the two parameters TAU and MIN_TURN would be the ones to explore, along with seeing the effect of un-commenting these lines:

    # set_commission(commission.PerShare(cost=0, min_trade_cost=0))  
    # set_slippage(slippage.FixedSlippage(spread=0))  

I have not done any OOS testing, and the data for one of the factors starts in 2010 (I think), and my understanding is that Q is not so interested in how algos perform prior to 2010. There's a 1-year built-in Factset hold-out, and I haven't bothered to use the contest to cobble together backtests to see the overall performance since 2010. The main focus is on the architecture; I'll eventually get back to looking at individual factors and improving the performance.

TAU = 5  
ALPHA_SMOOTH = 1-np.exp(-1.0/TAU)  

    combined_alpha = normalize(combined_alpha)  
    context.combined_alpha = (1-ALPHA_SMOOTH)*context.combined_alpha  
    context.combined_alpha = context.combined_alpha.add(ALPHA_SMOOTH*combined_alpha,fill_value=0).dropna()  
    context.combined_alpha = normalize(context.combined_alpha)  

With the code above, I'm implementing exponential smoothing (assuming there isn't a bug). The basic idea is to combine the past (context.combined_alpha) with the present (combined_alpha) via a weighted sum. The result becomes the new past, which is then used in the future, in a recursive fashion. The ALPHA_SMOOTH parameter controls the relative weight of the past to the present.

To make sure the recursion works properly, I'm normalizing both the new alpha vector (combined_alpha) and the old one (context.combined_alpha). This may not be necessary, but I think it is good practice, since if there is a normalization problem, things will get mucked up.

Have a look at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.add.html:

fill_value : Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result will be missing

I don't expect there will be NaNs (but I should check) to be filled with zeros. However, as the universe changes over time, I think one needs to set fill_value = 0 so that the sum can be done (otherwise the stocks in combined_alpha will be different from those in context.combined_alpha and one would be trying to add apples to oranges). The final dropna() is belt-and-suspenders as a final guard against NaNs causing the algo to crash (it may be unnecessary).

Hey Grant,

Thank you for taking the time to thoroughly explain and answer my questions! Really appreciate it, and like I said, I’m learning quite a bit from it!

Thank you and Happy Holidays!

Me too. Great codes. Learn from Grant

@ Joakim,

You're welcome. Let me know if you have further questions/comments/improvements. Happy Holidays!

@ Cc -

Have fun. Let me know if you have questions/comments/improvements.

Hi Grant,

Great work!

I have not done any OOS testing, and the data for one of the factors starts in 2010 (I think), and my understanding is that Q is not so interested in how algos perform prior to 2010.

How do you know this? Was this explicitly relayed to you by some Q staff? Just wondering why they would ignore the relevance of the 2008 GFC in the whole scheme of things!

Happy Holidays!

@ James -

At some point I recall Jess mentioning that starting in 2010 is the most relevant, but I'd encourage you to follow up with her directly.

The relevance of the 2008 GFC is only important if one expects it to happen again soon, and in the same fashion, right? I've been trying to learn from some of Ray Dalio's stuff lately, and the concept is that we had a long-term debt crisis in 2008, but one would not necessarily expect another one just like it anytime soon.

Happy Holidays to you, too!

@Grant,

The relevance of the 2008 GFC is only important if one expects it to happen again soon, and in the same fashion, right? I've been trying to learn from some of Ray Dalio's stuff lately, and the concept is that we had a long-term debt crisis in 2008, but one would not necessarily expect another one just like it anytime soon.

It is not so much what caused the 2008 GFC that really matters but more the statistical significance of fat tails phenomena in log normal returns distribution of the market. It does happen and repeats itself which some refer to as regime changes. Unless you take the short/medium term outlook of the market, it is pertinent to consider these regime changes. In particular, usually the inflection point of regime changes is characterized by high volatility (uncertainty) which is exploitable by mean reversing strategies and can present a very profitable scenario. Just my two cents!

@ James -

Regarding high volatility being exploitable, you might be right. My hunch is that one might not be able to predict when all hell's gonna break lose, but once it does, there's a decent bet that it'll have a trade-able half-life. I'd read that volatility tends to persist; there would probably be a straightforward way to investigate this in the Q research platform.

@Grant,

Thanks for sharing your algorithm. I've spent some time studying it and have a couple questions.

  • What is the reasoning behind WIN_LIMIT = 0 variable as opposed to leaving the default param limits=None when you winsorize? My understanding based off the docs is that they should do the same thing, no? If the limits are 0, the limits are none is my reasoning.

  • In the fcf Custom Factor, it looks like fof_yield_values are checked twice for a np.nan_to_num - once when you run the preprocessing method and again when you enter the parameter. It seems redundant.

Regarding improvements, in my readings, I saw that spectral clustering is most effective when the affinity matrix is sparse and when pyamg module is used. I haven't looked too deeply into your algorithm design yet to know if your affinity matrix is sparse too similar. I haven't used pyamg, but it looks like a promising tool based off the description.

https://github.com/pyamg/pyamg

You might find some things to make your memory usage more efficient with the library. Anyways, I will follow up soon with some more ideas and comments.

-Evan

The relevance of the 2008 GFC is only important if one expects it to happen again soon, and in the same fashion, right?

Theoretically, I imagine Quantopian assume that if an algorithm is shown to be genuinely independent of the general market, then the direction of the market and its volatility is genuinely irrelevant. And hence no back test needs to be run during 2007 to 2009.

This may be correct but requires "faith". I prefer to stock my hands in the wound: if the data is there, why not try it out? As it happens this algo does comparatively well in that awful period.

@ Evan -

Thanks for the feedback. Some comments:

  1. The winsorize setting of winsorize(a, limits=[WIN_LIMIT,WIN_LIMIT]) with WIN_LIMIT = 0 should just pass though a copy of a. WIN_LIMIT is available as a global in case one wanted to try winsorizing.
  2. There are likely redundancies in checking for nans and infs. I'm not too systematic in managing them. This should be on a TO DO list at the top of the algo.
  3. I more-or-less just threw in the spectral clustering without much research. Clustering, in general, would seem like a nice way to do diversification, but maybe it just leads to a lot of noise if the right clustering algo isn't used? It would probably be more effective to explore in the Q research platform, if you are so inclined.

Thanks for this nice algo!

About memory usage:
pyamg will very probably not help for memory usage, as it is just a multigrid solver... It will just speed up the computation of the clustering (with also introducing some error... efficiency come always with a cost!)
I am also facing some memory limitation when I deal with quite large amount of alphas with a window (storing the alphas of the past days). A solution I found effective is to store the past alpha in a pd dataframe (in context) enforcing the usage of float32. This create a bit of time consumption (type casting from the outputs of the pipeline) but it is fine and reduce the memory usage by a factor 2 :-).

About optimization of your algo:
You definitely store the alphas of the previous days... so you are using this memory... Creating a context variable to store the alphas of the previous days will therefore cost you the same memory. Then in your pipeline you compute everyday the alpha of the entire window, with a context variable you will only need to compute the new alpha, that will save quite some execution time of the pipeline :-).

Hi David,

Do you mind doing a simple mock-up of how this context variable can be used with pipeline to prevent redundant calculations?

Thanks!

@ David -

Thanks for the feedback.

I'd be interested in the details of how you enforce the usage of float32.

The problem I see with accumulating data in context is that there will be a transient ("warm-up") period until the window is full. With Pipeline, there is an accumulation prior to the start of the algo, so the window is full from the outset. I suppose one could try to figure out how to use Pipeline to output the trailing window at the start of the algo, but then just update it for subsequent days.

@Grant

Indeed the accumulating issue is a bit annoying! With my current version I indeed have a "burning phase" at the beginning, a period where the algo does not trade at all, just accumulating datas...
I think this is the good way of doing it: having a pipeline like yours for the initialization to fill the buffer. I will give it a try.... (In fact thats the reason I found your algo great lol, gave me the way to remove this f^#)$%&# burning period ;-) )

Without using a buffer I dont see the way to inforce the type. I do the typecasting when I fill the buffer. Pandas has a function "astype" ;-). When I do the copy of the new data into the buffer i do it enforcing the new types (series.astype('float32') ).

@Grant, I am very impressed by your algorithm. While I wonder why you share your alpha publicly, I am glad I found it. I have always wanted to compute the Spearman rank correlations between next day returns and today's factors over a window and take the average. I want to use this average to combine factors based on their correlations. Would that be possible with the pipeline? I am new to this and would really appreciate your comments.

@Grant

Here is a verions with rolling windows + your method for the first day. I did it with the help of 2 pipeline. (you can also see the typecasting of the output_pipeline)

@ Satya -

I have always wanted to compute the Spearman rank correlations between next day returns and today's factors over a window and take the average. I want to use this average to combine factors based on their correlations. Would that be possible with the pipeline? I am new to this and would really appreciate your comments.

Rather than doing it in Pipeline, you could use the code I posted above and do it in before_trading_start (instead of the clustering). You might be able to do it within Pipeline, but I think things are easier working outside of it.

Thanks @ Grant. I tried a different approach where I compute the IC between a factor and returns inline with custom factor definition. But it is not working. Any advice, please?

class MessageSum(CustomFactor):  
        inputs = [USEquityPricing.high, USEquityPricing.low, USEquityPricing.close, stocktwits.bull_scored_messages, stocktwits.bear_scored_messages, stocktwits.total_scanned_messages, Returns]  
        window_length = 21  
        window_safe = True  
        def compute(self, today, assets, out, high, low, close, bull, bear, total):  
            v = np.nansum((high-low)/close, axis=0)  
            v = preprocess(v*np.nansum(total*(bear-bull), axis=0))  
            v *= np.sign(sp.stats.spearmanr(v, Returns[-1])[0])  
            out[:] = v  

The above code does not work.

Secondly, since some factors affect different sectors/industries differently, I want to compute the IC by sector? Is that even possible with pipeline?

@ Satya -

I haven't done coding in Quantopian for awhile, and will probably keep things on hold. That said, I'd try to sort out how the IC is computed in Alphalens, and then reproduce it in your code. I suspect what it is doing is finding the IC between the projected return of a long-short portfolio (weighted by the demeaned, normalized factor) and the actual return. In addition to the trailing factor values (which I export to before_trading_start) you'll need the returns (which could be exported to before_trading_start with a another Pipeline, for example). You also need to decide the lag (and if it will be the same for all factors, or different). It becomes a bit of a project--but doable, I think.

@ Grant

Aren't your 8 year factors (e.g., GM_GROWTH_8YR) just looking at 8 days ago? That is, isn't gm[-8] just gross margin 8 days before last? Shouldn't you be doing [-8*252] instead (similar to what you have done for ROA_GROWTH_2YR?

@ Yaniv -

Maybe. I think I just lifted those from what someone else had posted. I was more interested in scaling with a relatively large number of factors, than the quality of the factors themselves.

@ Grant

Makes sense - nice work nonetheless though!

Thanks Yaniv -

If you end up learning anything, please post your insights here. I know some folks are reluctant to share their "alpha" but my experience has been that it is much more fun and productive to post almost everything for feedback.

Grant, pretty cool. Initially, I was suspicious of the clustering part, as the algo is clustering 20 data points, each having ~10,000 (5*nb_stocks) features, but I see the point of it now. It is trying to find which of the 20 factors are more alike and discount them accordingly.

I will certainly spend some time thinking of other ways to do this and post it back. One way would be to have 5*20 data point with nb_stocks features. Or any other approach that increases the number of data points and reduces the amount of features.

Hi @Grant,

Looks good :-)
Re. your comment: " ...the relevance of the 2008 GFC is only important if one expects it to happen again..."

My comment below is somewhat similar to @Zeno's, but with a slightly different take on it, as follows:
Although IN THEORY EquityLS should be independent of market direction or perhaps (?) other attributes related to "market regime", such as volatility, etc, we do of course see variations in EqLS algo performance over time. A problem for us now is that, although we have had a broadly similar market regime in the US for a long time, we know that eventually something will change, market behavior will differ, and so will our algo performance. Sure this is all stating the obvious but, at least IMHO, the wider range of different market conditions over which any trading system/ algo can be tested, the better. Not because we expect the past to repeat, but simply because the wider range over which we test, the more we can hope to have at least some confidence that our system will remain robust under "different" conditions, whatever those unknown differences might turn out to be. I'm all for testing under as wide a range of different times & conditions as possible!

@ Luc -

Thanks for the feedback.

It is trying to find which of the 20 factors are more alike and discount them accordingly.

Yes, that is the basic idea.

Note that I did not make any effort to determine the optimum number of clusters (which is fixed):

clustering = SpectralClustering(n_clusters=3,assign_labels="discretize",random_state=0).fit(alphas_flattened)  

Presumably, the optimal number of clusters could be estimated offline. Alternatively, maybe there are clustering algorithms that dynamically and automatically determine the optimum number of clusters?

One potential improvement in memory management would be to limit the number of stocks passed on to before_trading_start for the clustering. Presently, it is the entire QTradableStocksUS. I had started to look into cutting it down to the extremes of the alpha values, common across all factors, but haven't sorted it out.

There may be other memory management improvements, as well, since this seems to be a limiting factor (versus compute time, which is a quite generous 5 minutes in before_trading_start).

@Grant

generous 5 minutes in before_trading_start

I have timed my pipeline computation to be around 100 seconds for loading 1 month of data in the notebook. This algo tends to have pipeline timeout in the contest. I read that Q will compute in chunks of 3 months of data for back-test. So I assume 300 seconds for loading 3 months of data which is 5 minutes. The question is that if I run the back-test, I wont experience pipeline timeout but in the contest, it will encounter.

Do you know how to time the pipeline accurately?

Thanks
L

@ Leo c -

No, not sure how to time pipeline accurately. I also suspect that it is not entirely repeatable, since my understanding is that it pulls data periodically from one or more databases, and so there could be some congestion, depending on the load.

Have a look at this:

https://www.quantopian.com/posts/before-trading-start-timeout-fix

We have 10 minutes for Pipeline (run prior to before_trading_start), and a separate 5 minutes for before_trading_start.

I don't really understand the Q paradigm here, since the 10 minutes for Pipeline to run effectively isn't accessible in live trading. The 10-minute limit really applies to backtests, for which the Pipeline computations are chunked (126-day default Pipeline chunks, per Scott's comment here). It would seem that in live trading, even though the 10-minute Pipeline limit still applies, since a backtest must be run prior to live trading, effectively one has only (10 minutes/day)x(60 seconds/minute)/126 days = ~ 5 seconds/day. In other words, Q algos only support ~ 5 seconds of Pipeline computations per trading day, if I'm thinking about this correctly. On the other hand, since before_trading_start computations are not chunked, one can access 5 minutes per trading day. There is a 5 hour limit on contest entries per the information here. So, one can periodically use the full 5 minutes, but not every day (since a 2-year backtest would then take 42 hours). It would be nice if before_trading_start could be scheduled, but there are work-arounds to this limitation.

Thanks for sharing Grant!

I'm still trying to make sure I understand exactly what your algo does, but it seems like you have three separate elements of alpha:

  1. Alpha factors (i.e. what you'd get equal weight of all factors -- most likely negative)
  2. Clustering of alpha factors (i.e. combining correlated factors, and taking the average of different implementations to reduce specification risk)

To me, it suggests that improving the quality of your alpha factors will still have a significant impact on your overall alpha, and the fact that "throw all factors in a pan and shake" worked well means that choosing high quality alpha factors would make results even better.

In addition, looking at your tearsheet, it appears that your algorithm adapts to underlying shifts in the market. For example, the value risk decreases around Jan/Feb 2018 as value shares became more attractive, compared to the growth rally of the previous years.

I am curious where this "alpha factor adaptation" comes from, and I wonder if this is actually the third alpha variable that generates your returns.

Hi Danny -

I'm currently inactive on Quantopian, but glad that you found my post useful. I can tell you that the algo was thinly researched--I just threw stuff together mainly to understand how to do the architecture and the platform limitations. So, I wouldn't get too wrapped up on the performance side of things, as there may be over-fitting, and other bad practices at play.

When I clone and run the algo I posted above (https://www.quantopian.com/posts/alpha-combination-via-clustering#5c1e5fa95c95c9003d03556f), I get the error:

Something went wrong. Sorry for the inconvenience. Try using the built-in debugger to analyze your code. If you would like help, send us an email.

Are others getting the same thing? Any ideas why the code won't run? I simply cloned it and clicked "Build Algorithm" and pfft!

Yes, I get the same error message. Would have been interesting to 'forward test' it a bit since there is now quite a bit of OOS data available since mid Dec to test on.

Hmm? Guess something changed in the Q API/platform?

Try sampling start dates for any others.

2017-12-01 05:45 WARN sklearn/preprocessing/data.py:169: UserWarning: Numerical issues were encountered when scaling the data and might not be solved. The standard deviation of the data is probably very close to 0.  
2017-12-01 05:45 WARN numpy/lib/arraysetops.py:200: FutureWarning: In the future, NAT != NAT will be True rather than False.  
2017-12-01 05:45 WARN sklearn/manifold/spectral_embedding_.py:215: UserWarning: Graph is not fully connected, spectral embedding may not work as expected.  
2017-12-11 05:45 WARN sklearn/preprocessing/data.py:153: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.  

@ Blue Seahawk -

Thanks...seems to be a problem with the original start date of 2010-01-01. It runs if I shift the start date to 01/01/2017.

Here's a new backtest (no change to the code posted above).

Here's an update. I made a few changes:

  1. Switched to KMeans clustering.
  2. Moved the exponential alpha vector smoothing to the rebalance() function.
  3. Implemented TargetWeights as the final optimization API objective.

Here's the tear sheet for the algo immediately above.

Hi @Grant,

Thanks for sharing your updates! Very cool in my opinion.

Just a suggestion, but you may want to consider including an option to normalize some (fundamentals based) factors within each sector or industry, instead of comparing broadly across the board. For example, some companies will require more debt and/or more capital expenditures just by the nature of the business they're in, just to remain in business. Utility companies for example might need to rely on a higher level of debt and continuous incoming capital to maintain and upgrade their infrastructure (CapEx), whereas software companies will not need as much, and might have mostly operational expenses. Therefore, comparing these type of factors across the board may not be comparing 'apples to apples' and might explain why you see continuous long exposure to some sectors (e.g. health care it looks like in the above one) and continuous short exposure to other sectors (e.g. energy in the above).

Thanks Joakim -

One thing I'm seeing is that due to memory constraints, I need to pare down the data passed to before_trading_start where I do the alpha combination step. If I try to pass more than ~ 5 trailing days of data, I run out of memory. Something like 20-30 days would be better, I think.

Hi Grant,

From a model architectural design, this hybrid ML (clustering) on various delayed quantamental factors appears sound and I believe this is your intention, to test the design not too much focused on the factors. With this approach, personally and in addition to Joakim's suggestions on sector/industry normalization, I would limit the number of factors to within 4-6, perhaps through feature analysis and extraction phase. The added bonus to this is you will have more memory to test longer lookback periods. Just my two cents.

@ James -

Yep. Just kinda kicking the tires with the large number of factors here. I'm not sure limiting the number of factors is necessarily the way to go; judiciously combining some of them within Pipeline might be better, and then exporting the set of "super factors" to before_trading_start for the final alpha combination extravaganza, where one can take advantage of the full 5 minute compute window.

@Grant, yeah whatever works for you.

Personally I don’t understand how the clustering works. Clusters of factors? Clustering around what? Past returns? Are the factors within the clusters equally weighted, and are the clusters also equally weighted? What’s the advantage over doing it manually, e.g a manual Value factor cluster, Quality factor cluster, etc?

Regarding factor smoothing, some factors might have the highest mean IC at 5 days holding period, whereas others might have longer ideal holding periods? If one could do clusters based on peak mean IC holding period, and then smooth each cluster based on that, that might work well? Not sure if that’s possible in the IDE though.

Anyway, whatever you’re doing it appears to be working, at least in-sample, regardless if I understand it or not.

Regarding my thinking on using the clustering for weighting, basically it should be a way of avoiding putting too much weight on sets of factors that are similar. For example, if factors A, B, and C are all in the same cluster, and D and E are in another cluster, then A, B, and C would get a weight of 1/3 each, and D and E would get a weight of 1/2 each. I'm not sure if I did the coding correctly, but this is the concept I had in mind.

Thanks Grant, conceptually that totally makes sense to me.

What are you using to 'cluster around' to determine if factors are similar or not? Looks like you're doing this in BTS but I can't figure out what's going on (I'm really bad at reading someone else's code, sorry).

Past returns of N_FACTOR_WINDOW might make sense, though future returns might be even better I reckon during a 'training period' (no idea if this is possible in the IDE, as then you could potentially do data snooping into the future...). Maybe by using an ML algorithm like KNN or K-Means Clustering?

Also, how would one determine the 'right' number of clusters? Maybe by setting a minimum density of the clusters? (I'm just thinking out-loud)

Grant,

The concept is pretty cool and tank you so much for sharing. I've played a bit with your algo and the coding is correct: it does as described. It does makes sense that you would want factors that behave similarly over a set of stocks to be penalized and factors that are more unique to be promoted, and your algo seems to be doing just that. However, in the end, we are not sure that the factors being promoted do work better and the ones demoted are not as good, but I guess that factors should be chosen correctly before hand.

From an ML point of view, I wonder how efficient the clustering is as it is training on a training set of size n-factor with nb_stocks*window_length number of features (about 1700 features for 10-20 training samples). Usually, I would prefer seeing a lot more training samples then features, but in the end it seems to works.

Finally, I have noticed that the algo seems to overfit on the initial selection of factors. How did you end up choosing the particular factors?

@ Joakim -

What are you using to 'cluster around' to determine if factors are similar or not?

A trailing window of the factor values (which are all normalized with preprocess).

Also, how would one determine the 'right' number of clusters?

I suppose one could run a series of backtests, varying the number of clusters, and find an optimum (max Sharpe ratio?).

@ Luc -

Thanks for checking the code. Glad you've found it helpful.

How did you end up choosing the particular factors?

Some factors I had created myself, and some I'd collected on Quantopian from posts and documentation/examples. The goal was simply to have a relatively large number of them, versus having "good" factors.

How about several alpha columns and .corrwith()
https://www.google.com/search?q=corrwith+site:stackoverflow.com

Each stock would correlate best (or worst for shorting) with a particular alpha factor on a given day and to some degree (weight) compared to others' correlations with their own particular strongest alpha match.

Thanks @Grant, makes sense.

Does it use all the factors, even if a factor is clearly outside any of the N clusters?

@ Joakim -

Yes, it should be putting each factor into one of N clusters ("buckets"). There's nothing in the code that would reject one or more factors (by setting weights to zero).

One problem I need to understand is the rather severe memory limitation. For example, this appears to be about the limit:

n_factors = 40
n_stocks = 2120
N_FACTOR_WINDOW = 6

This is 508800 values, which for a 16 byte 64-bit float, is ~ 8 MB, which isn't much. However, if Pipeline 6-month chunking is included (6 months x 20 days per month), it puts it at ~ 1 GB, which is sort of within the ballpark of the 8 GB of RAM allocated (I recall seeing this reported...not sure if this is the actual limit). In notebooks (research platform), the chunk size can be adjusted (via the chunksize parameter), but in the backtester it is fixed at 6 months. Presumably, when an algo is run live, there's no chunking at all, and so one gains back the 6x20 = 120 factor in memory utilization? If this is the case, it places a rather severe limitation on backtesting compared to live trading. I don't understand why Q would allow changing chunksize in notebooks, but not in the backtester. Does anyone understand this?

Hi Grant,

I know this was addressed somewhere in the comments above, but there's likely a memory limitation due to how the pipeline is recalculating days. For example, a factor window of 4 would means for Jan 5 would calculated Jan 5, Jan 4, Jan 3, and Jan 2. Jan 6 would then calculate Jan 6, Jan 5, Jan 4, and Jan 3.

I implemented your N Days Ago factor capability with David's more efficient pipeline. See here if you're interested: https://www.quantopian.com/posts/efficient-n-days-ago-factor-research-and-algo-templates#5ca199d82b596f00422bdd18

Thanks Kyle -

So what would need to be changed in my code to make it more memory-efficient? Or are you saying that it's just how Pipeline works, and no changes to the code will help reduce the memory usage?

The notebook and algo are set up to automatically combine the pipeline data into the format yours is already outputting. Basically, there's the initial pipeline that pulls the first N days of data, then the normal pipeline pulls individual day data. Some pandas merging, etc. takes place, and the result are alphas in the form based on your work.

So in terms of changing your code, I'd recommend restructuring it to be like the templates I posted, or, copy your factors into the template and add in your alpha combination via clustering, etc. Hope that helps.

code to make it more memory-efficient?

Each step in pipe, if there is filtering for stocks to be returned and setting others aside, a mask progressively becomes a smaller and smaller collection for each of the other factors to be working through. Include the latest newly more lean mask when calling each factor.

Another tip for memory consumption/execution time. I found that in some case it is better to not compute factors with a window length, but only factor on a given day, then store them in a data frame in context and recompute current factor window average from the data frame. But then it is better in term of execution time (probably due to locality) to have each factor in one series of day (instead of having a series of factor for each day) but it is really relevant when you are fighting for some millisecond :-).

David

Also slowdowns are proportional to the number of columns in pipeline output. The dataframe alphas has 200 columns. Grand prize for the most I've ever seen. By the way, in those, there are an average of about 4 nans present. (log_data())

Looks like the alphas are all used on this line in a loop:

a = alphas.iloc[:,f*N_FACTOR_WINDOW:(f+1)*N_FACTOR_WINDOW].values

Length of the resulting alphas_flattened is just 40.
Then alphas also at alphas_current = alphas.ix[:,::N_FACTOR_WINDOW] in the making of combined_alpha

Incidentally GK, how on earth did you assemble something this mind-blowing lol. Don't answer that.

This avoids an unnecessary variable that is large.
return Pipeline(columns = pipeline_columns, screen = universe)
In a similar way, even the beta variable can disappear elsewhere.

Sometimes we tend to have a variable used only one other place as a way to remind ourselves what it is. Sometimes using a comment instead can work ok to avoid the extra variable's consumed memory, for speed.

Use like (alphas_current/weights).sum(axis=1) in place of range loops, same result, faster. Ignores nans. The axis=1 is saying to sum row values in that case I think.

Might be a good time for timing, tedious to set up but the output is great.

The most significant speed increase would be fewer columns in case there's some way to move combined_alpha inside pipeline, there's a factor for that? :)

@ Kyle M. -

In the code you posted on https://www.quantopian.com/posts/efficient-n-days-ago-factor-research-and-algo-templates, you have 3 factors, and:

WINDOW_LENGTH = 5  # trailing window of alpha factors exported to before_trading_start  

I have 40 factors, and an upper limit of N_FACTOR_WINDOW = 6, beyond which I hit a memory limit. Doing the scaling, your code should run with WINDOW_LENGTH = 80 (given that it only has 3 factors), which it does. However, if I try WINDOW_LENGTH = 160 I get a memory error. So, your code doesn't improve the memory management by more than a factor of 2.

I think there's only two things that would help:

  1. Increase the memory available for the relevant steps in the code.
  2. Reduce the amount of memory required (e.g. allowing users to reduce the chunk size, trading execution speed for memory, and/or reducing the size of the data stored, by using lower precision values...if you think about it, float64 is overkill for an alpha factor ).

I'm bumping into a bit of a mystery. In the algo I posted above, if I change (in factor_pipeline() and beta_pipeline()) to:

universe = (  
        AnnualizedVolatility(mask=QTradableStocksUS())  
        .percentile_between(90,95))  

I should be able to increase N_FACTOR_WINDOW by ~20X before running out of memory. However, I get nothing close to 20X; the advantage is only about 2X.

What am I missing? Shouldn't reducing the size of the universe free up memory roughly in proportion to the universe size reduction? Or is there some overhead (either in the Pipeline plumbing and/or my code) that I'm not accounting for?

I'm now wondering what mask actually does in Pipeline? If it doesn't control what stocks are read from the database, then it won't help (much). If all stocks in the universe are read from the database, including a trailing window of their corresponding data, and a subset is used based on mask then there's no way to control the memory usage, right?

Try applying mask to percentile_between

universe = QTradableStocksUS()  
av = AnnualizedVolatility(mask=universe)  
universe = av.percentile_between(90, 95, mask=universe)  

Or should that last line be universe &= ? Adding to (further limiting) the mask.
As a rule, whenever mask can be added between parens, I add it, just to be sure, like zscore(), rank(), percentile_between(), factors et. al. since it is sometimes essential to avoid processing on all 8000 stocks.

Thanks Blue -

Just tried your suggestion - no difference. Kinda weird. I get the impression if one scaled down to even a few stocks using mask the trailing window one could export out of Pipeline to before_trading_start would be quite limited.

It reduced your factor_pipeline from 2093 stocks to 102. Did you mean it was still mighty slow?

Blue Seahawk -

The issue is not slowness, but hitting a memory limit. In going from 2090 stocks to 102, I should be able to increase the trailing window of data by ~20X. I'm only seeing an increase of ~2X

@Grant,
Not sure that this is the right place for this, yet here we are...

I modified your algo above to be contest ready...I'll use it as one of my entries...
Nothing major was done wrt strategy ... here is what was done:
-Cut out factset simply because I don't want a year's holdout...seems artificial to me and more like playing with one hand tied behind your back.
-Cut out other fundamental factors that don't seem to work anymore due to company balance sheet manipulations.
-Added in risk constraints to pass the contest criteria.
-Filtered the pipeline as was suggested above.
-Ran a 2-year backtest as that is what the contest uses.

Comes out with a good example of what we're terming a "Phantom Strategy"...Dollar Neutral, Beta Neutral, etc,
Thanks for publishing your original!
Tearsheet to follow.
alan

Tearsheet for above:

@ Alan -

Glad you found it useful! And thanks for sharing!

Thanks Alan for sharing.

I have coded the algo to run live on Interactive Brokers using the IBPY API and Quandl as data source. I'd be happy to share with anyone who contributed to this algo. Private message me for it.

/Luc

One key thing here is whether or not the clustering provides any benefit. I just chucked in in there without proving it out. So, if anyone has evidence, please share it.

Hi Grant,

Looks very cool. Where do the factors come from, did you code them all up by yourself?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Thomas -

Thanks for checking it out. A few factors came from me, but a lot of them I grabbed from forum posts. I wanted lots of factors to see how it would scale with the number of factors.

The main point here is to do the alpha combination in before_trading_start where one has a full 5 minutes, should it be needed. I also find trying to do computations in Pipeline to be overly complicated/cumbersome, so getting the data out the Q-specific API and into the "real world" for me is an advantage. Another possibility would be to have some minutely factors in before_trading_start to throw into the mix (I'm not sure this is feasible within Pipeline).

The main problem I see is a memory a memory limitation. One mystery is that if I use mask to reduce the universe I don't get a commensurate increase in the available memory (see comments above, https://www.quantopian.com/posts/alpha-combination-via-clustering#5cdf3aa4d0a1260043bd97b2). Is there an explanation for this?

About your question on "does clustering helps".

To answer it, one would need to understand what is clustering and why it has been developed ;-).

In a machine learning point of view, you start with input data and from them you want to perform a prediction. One key element of improving the learning process (so the fit you are doing using your favourite algo (linear regression to NN)) is feature selection. Basically a "feature" is one of the input feeds to your fit. Most of the time if you just take the raw data, you will end up with a poor quality of fit, and therefore a poor quality of predictions. There is endless solutions to preform a feature selection, clustering is one of them. What your algo do, could be understand as follow:

1) starting with N factors.
2) Find how they are clustered
3) for each cluster, define a feature which is the linear combination with equal weight of the element of the cluster (with the weight being 1/N, N being the number of element of the cluster)
4) then the predictor is the linear combination with equal weight of those new features.

To push a bit more your algo, you could try to replace 4) with some fitting method (I would start with something simple as a linear regression, using your feature to fit the returns, but then you might need to use a defined number of cluster maybe for example the x largest...).

Oh by the way something to test:
It is not clear if your really need to recompute the clusters every days. Would be interesting to see if factors which are clustered stay in the same cluster and what is the "typical time of correlation". If this time is really large, that would allow you to perform the clustering in the research env, then export it as a table in your algo. Or maybe recompute the clustering only every N days. That will leave you some computational time/memory for anything you want to do with the new features...

@ Thomas W. -

If I understand Pipeline correctly, it chunks data reads from disk, and then computes factor values over the entire chunk. I believe it operates on 6 months of data. Those data are just sitting in memory, but can't be accessed in before_trading_start. I'm wondering if when Pipeline chunks, the entire 6-month chunk of computed Pipeline values could be made available to before_trading_start (ideally passing a reference, so that more memory isn't required in copying the data). What am I missing? It seems like one should be able to access the entire 6-month chunk once it is computed.

Name of each class for columns should be available something like this:

    for k, f in enumerate(make_factors()):  
        for days_ago in range(N_FACTOR_WINDOW):  
            pipeline_columns[f.__name__+'_'+str(days_ago)] = Factor_N_Days_Ago([f(mask=m)], window_length=days_ago+1, mask=m)  

The variable chunksize is an optional argument in zipline by the way.
Maybe caching given our repeated backtests (so long as changes each time are not to pipeline code) could be worth looking into.

The factors collected from various sources adopted several ways of coping with nans, might be worth the effort to standardize them.

np.absolute((high-low)/close),axis=0)  
v[np.isinf(v)] = np.nan  
values[np.isinf(values)] = np.nan  
np.isnan(results),0,results))  
np.nan_to_num(fcf_yield[-1,:]))  
np.nanmean(close, axis=0)  
np.nanstd(returns, axis=0)  
np.nansum(-p,axis=0))  
ta = np.where(np.isnan(ta), 0, ta)  
np.repeat([np.nan], 6 - len(quarterly_values)),  

I started modifying Allan Coppola's version of this program.

First, I wanted more time, but could not exceed 4 years without crashing, so limited testing on those 4 years.

Then, I eliminated some of the factors (which were not productive) as well as ignore the stocktwits import. Then modified some of the numbers controlling this trading strategy in order to give it more leeway.

I find the program so restrictive in its constraints that it becomes a real impediment to better results. Nonetheless, the attached notebook does show one can do better, and still fit within the contest's constraints.

I still want to explore how far I can go. I will try to change the trading mechanics of the thing and see where it all leads.

I can already see another book coming out, Guy!!! Did you enter this in the contest already?

@James, not ready for that yet. I want more...

Profitability hit rate of 0.59 is impressive, I'll give it that (credit mostly going to Grant and Alan - thank you for your generosity!), though some of it could be just due to parameter over-fitting.

Regarding this, in case anyone's wondering:

Then modified some of the numbers controlling this trading strategy in
order to give it more leeway.

Based on the information in the tear-sheet, this likely just includes holding more positions, e.g. MAX_POSITION_SIZE = 0.005, less restrictive turnover constraint, e.g. MIN_TURN = 0.12, and less restrictive Risk constraints, likely just using the default ones, e.g:

    # MAX_RISK_LMT=0.01  
    risk_model_exposure = opt.experimental.RiskModelExposure(  
           context.risk_loadings,  
           version=opt.Newest,  
           # min_momentum=-MAX_RISK_LMT,  
           # max_momentum=MAX_RISK_LMT,  
           # min_size=-MAX_RISK_LMT,  
           # max_size=MAX_RISK_LMT,  
           # min_value=-MAX_RISK_LMT,  
           # max_value=MAX_RISK_LMT,  
           # min_short_term_reversal=-MAX_RISK_LMT,  
           # max_short_term_reversal=MAX_RISK_LMT,  
           # min_volatility=-MAX_RISK_LMT,  
           # max_volatility=MAX_RISK_LMT,  
    )  
    constraints.append(risk_model_exposure)  

All good moves I think, but nothing so revolutionary that I feel the urge to write a book about it.

@Joakim, I too do not think there is enough stuff to write a book on this one. At least, not yet...

Your assumptions on how I might have increased the performance level while maintaining eligibility to the contest are mainly wrong, well let's say it: they are all wrong. The optimizer is very restrictive on its own.

What I am doing is playing on the trade mechanics of that strategy. If you can have an edge, on average, per trade, can you increase the number of trades? Since that would do the job.

If you look at the tearsheet, you will notice there were 113,999 trades over a 4-year period. That is about 4 times as much as in Allan's tearsheet. Even the shorts, on average, are making a profit.

@Guy,

Since Grant and Alan were generous enough to share code on this aglo and since you don't think ..."there is enough stuff to write a book on this one. At least, not yet...", it would only be appropriate on your part to share code of your modifications. Otherwise, community members might get the impression that you are just feeding off other people's work and just showing off with what you can do while keeping it under wraps. I think the purpose of Grant and Alan sharing code is to try and educate other community members and also get feedback, constructive or otherwise, from more advance users in perhaps a collaborative effort to improve on it. All I see from you Guy are modifications of contributed open source algo code or content from other community members which you try to improve upon and show off but keep to yourself and later becomes the subject of your books. If you want to contribute to the community, share the code. If the code or idea is your original then there is no problem keeping it to yourself, it's your IP. Now if you keep your modifications of other people's algo a secret and later write a book about it or profit from it, don't you think you should share the bounty to the original author of the algo, at the very least? That is the ethical and right thing to do. Or is this just your style? Guy, I don't mean to offend you but just need to call you out on this one lest other members will be discouraged to share their code.

@James

Very well said!

Indeed, very well said @James!

@James, trading strategies are posted to the Quantopian website up to a certain performance level, and not beyond. For instance, if the original version of this strategy was at the level where it could win the contest hands down, it would not have been published.

Moreover, I have not seen any of the high-scoring contest participants publishing their scripts either. Nor have I seen any published non-contest high-performing trading scripts. No one is wondering why? They know.

Once published, for whatever reason, a trading script becomes public domain. Technically, the same as a shareware of old. However, most of the published trading scripts are not worth much as you must also have noticed.

Nonetheless, we do not see published strategies the same way, what is discarded by one might be a gem in the making for someone else, like me, for instance.

Say I take a published trading script as a template, which might be similar to many others, then remove what I think is not needed, am I doing something wrong if I start adding code snippets from some other templates in order to make it more productive?

Note that doing so might change the very nature and purpose of the original program. To such an extent that whatever the original authors' intentions were, (they also used other peoples templates to design their own), they would not recognize what has been done to their script after all the modifications. And as such, from template to template to template, from modifications to modifications, you have what should be considered original work, original IP. Especially if the resulting program does much better than the original script.

I do not look at a trading strategy the same way as most. My interest is in the payoff matrix equations and their derivatives, not factors. In fact, I removed more than half of the factors from the original program.

Mostly, I look at two numbers: the average net profit per trade, and the number of trades. Then I try to find ways to increase both. The rest, to me, is program décor.

Just as a teaser, one can do better than what was presented thus far. After all, I might get interested in another book. We'll see...

It's not expected that folks would publish algos that perform better than the one I shared. I'm not really chasing performance, anyway. The point of the post was to share the structure and the technique of using clustering for assigning factor weights (which, I'm sure is nothing new...for example, the write-up on https://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html shows that clustering will find stocks that one would expect to be similar, so the same should be true of alpha factors).

@Grant,

While your algo is something that is not new and your purpose is not highlighting performance but rather "share the structure and the technique of using clustering for assigning factor weights", I am calling out Guy for proper decorum. Proper decorum is what Alan did, modified your algo, posted modified code, credited you and informed everyone that he entered it in the contest. Also Luc, informed everyone that he coded your algo "...to run live on Interactive Brokers using the IBPY API and Quandl as data source. I'd be happy to share with anyone who contributed to this algo." This is proper decorum. Guy, on the other hand, is just tooting his horn and perhaps another book in the making!

We are loosing the essence of collaborative efforts in learning and exploring ideas in search of alpha,

The following two charts result from adding some of my OWN code snippets from my program (the DX-08 version) to Allan's reduced version of the program.

Based on the tearsheet, the above is equivalent to a 25.6% CAGR.

The strategy did 123,931 trades over a 4-year period. Enough to say that, on average, the strategy did this or that.

This thing, on purpose, exceeds volatility, turnover, leverage, net dollar-neutral, and sector-neutral exposure constraints. Some, not by much, but still marginally crossing the limits for a few days, up to a few weeks.

However, the strategy does compensate for this by showing a higher performance level even with the Optimize API doing the trading. Notice that the max drawdown was not exceeded, but it came very close. And the 0.13 average volatility should not be considered that excessive.

The program still does not do what I want it to do, and therefore, I would still have other modifications I would like to make. I thought it would be nice to show an intermediary step into the transformation of this trading strategy.

There might not be a book on this one after all. If I do more than 4 years, the program crashes. If I increase the initial capital to $100M, the program crashes. If I increase the number of trades further, the program crashes. This may be due to timeout errors or something.

I wanted to know if the program was scalable. I know it is, but I cannot demonstrate it. I wanted to know how much leverage it could support without breaking down, but I cannot show that either. The program crashes. I find it frustrating. All the time trying to find workarounds. Regardless, that is part of the development cycle. So why complain?

I try to explain what I do. Show the possibilities. I do not look at the portfolio management problem the same way as most here. My interest is in the payoff matrix equations and their derivatives, not factors, not even clusters of factors. It is a major change in perspective.

If you find my books too expensive, you have the equivalent or something close to it freely available on my website. I left enough explanations and clues for anyone to duplicate, in their own way, the same things I do.

And also, as said before: I do not want to be responsible for anyone misusing any of my programs and as a result, lose their money.

Reengineer your programs yourselves then you will be responsible for your own code, that it be good or bad. All I can do is to show what I think is possible using the very same tools you do.

Hi Guy -

While I'm glad that you are using the code shared here as a springboard, without detailing the changes you made, we aren't moving the discussion forward per my original intent of the post. It is clear from your posts above that you were able to improve the algo, and that point is now understood. Without your code, or at least descriptions of substantive changes, we can all congratulate you on your success, but are left with no clue how what you've done relates to code that was shared. I suggest starting a new discussion thread if you want to continue describing your impressive progress.

@Grant, unfortunately or otherwise, I am dropping this strategy as a no go. It does not pass my battery of acid tests.

I cannot increase the trading interval, the program crashes. How could I find out how it could survive for an extended period of time? Crashing in the future is not an option.

I cannot give it more trades, the program crashes. How could I scale it up? How can I continue reinvesting the ongoing profits if not by trading more, increasing the average net profit per trade, or doing both at the same time?

I cannot scale it up, the program crashes. If I double, or more, the initial stake, the program does not terminate. How could it aspire to a larger initial capital? Again, blowing up in the future is not an option.

I cannot increase leverage that much either, the program crashes. A little leverage allows to progressively trade more and increase the average net profit per trade on the condition of having an average positive edge that more than covers the cost of leveraging. If Quantopian intends to leverage contest strategies up to 4 or even 6 times, this thing will not make it.

My acid tests allow me to know beforehand how a strategy might have behaved under such conditions.

There is no new thread coming up on this one and no book either. As said before: I do not want to be responsible for other peoples' losses, misuse, misunderstanding or misinterpretation of what I do.

Nonetheless, as an interesting point. I was not using clustering as you intended it for this trading script. I was only interested in the daily rebalancing based on TargetWeights . The rest to me, technically, was just décor. Only wanted to see how TargetWeights behaved since Q has reportedly corrected an existing and long-standing bug in its Optimize API procedures. I chose Allan's script because it was shorter. And even there, I made it even shorter. Mostly, and overall, it is still not enough.

Thanks Satya -

The idea of using the information coefficient (IC) for factor weighting seems sound, although my hunch is that it is noisy, and so lots of data are needed for it to work.

Overall, unless something is changed with the scalability of the Q backtester, I think this will be a Sisyphean task when dealing with a relatively large number of factors, since one also needs a large trailing window of data for each factor. One approach would be to look for ways of providing initial values (based on analysis in the research platform), and just do updates in the backtester/trading platform. An example is exponential smoothing, for which one needs only the most recent raw value to do an update of the smoothed value. There's no need for the algo to deal with a big-honkin' trailing window of data, if the initial smoothed value is known. The algo then becomes sensitive to start date (since the initial value would be hard-coded), but this isn't a show-stopper (unlike a memory limit, which is a killer...the song Killer Queen comes to mind...completely orthogonal, I know, but we gotta accessorize this dry discussion).

My guess is that there are lightweight iterative approximations for a lot of algorithms that typically would require the entire data set for each update. Might be an area to research...please share if anyone has insights.

@Grant, I think the problem starts with the first equation you presented.

alpha_combined = w_0*alpha_0 + w_1*alpha_1 + w_2*alpha_2 + ... + w_M*alpha_M  

and where you say: “...if a cluster has N alpha factors in it, each factor in the cluster gets a weight of 1/N.

The assumptions made imply that each factor has an equal weight. I seriously doubt that.

Each factor in such an equation would turn out to be in decreasing order of importance. And if compared individually, we would see a decaying function of their importance as the number of factors increased. Something like in principle component analysis, the more factors you add, the less they explain the portfolio variance.

If your first 3 factors can explain some 75+% of the stocks price movement, the other 17 you used in your algo would have to share what was left in decreasing order of importance. Note that French-Fama first started with 3 factors and extended to 4 after years of research. I have not seen a paper by them going any further.

If the alpha factors do not have equal weights, the function you propose still stands, but it does change its meaning. It gives the weights all the importance of what is analyzed.

This means that the weights should not be equal by design. They should be correlated to the importance of each factor. The highest weight going to the most significant factor, and from there, in decreasing order. This way the least influential factor would get the least weight of all.

With 5 or 6 factors, it should be sufficient to explain some 95% of price variance. And with all the noise that appears in price movements, it should be considered sufficient to make any point. Trading at this level is not a precise science, the best we can do is make approximations of what might be. Adding 10 components might bring you to a 98% of explained variance, but is it worth it when most of it might be some further analysis of market noise?

Thanks Guy -

The inspiration for the algo came from the example https://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html and from clustering examples on Quantopian. Given a large jumble of factors plucked willy-nilly out of thin air, how should one allocate capital to them? To your point, a large number of factors should be reduced down to 5 or so super-factors. One can then decide if each super-factor gets an equal share of the capital, or not; in the example I've provided, each super-factor gets the same weight. And one could tweak the weights within each super-factor, based on some criteria.

Unfortunately, we don't get to learn how Quantopian is handling their ~40 fund algos (factors). Presumably, they have explored various alpha combination approaches, and found something workable. There should be a point of diminishing payback; just adding more algos to a fund that already has ~40 algos may just jack up operational expenses, without commensurate returns, but maybe my intuition is flawed. We do know, however, that money has flowed into the fund, and into the pockets of authors, per https://www.quantopian.com/posts/quantopian-business-update. So, something must be working.

@Grant, let's try to operate with 40 different strategies with 40 factors each and see what would be implied in such a portfolio. At stake would be 400M in capital, and a single strategy could be considered as either a source of portfolio alpha, a profit factor, a contributor to overall performance and be weighted accordingly.

First, the 10% of the available volume rule would start to greatly interfere with the process. Second, the stock selection process itself would generate quite a few duplicates. Third, going long/short in all these portfolios would add to the duplication of intent. One strategy going long on a stock while the other might be shorting it. Forth, the strategies just as the factors are in decreasing order of importance, and they too should not be treated equally.

It goes like this:

\(\sum (H_1 \cdot \Delta P) > , \dots , > \sum (H_{spy} \cdot \Delta P) > , \dots , > \sum (H_{40} \cdot \Delta P) \)

Each strategy, from 1 to 40, can be ordered by how much they generated over the entire time horizon. Some doing better than the average (SPY), some not. And where holding SPY is somewhere in the lot.

At the start, you can give all strategies equal weights since you will not know what will be the order in which they will finish, but they will finish as an ordered CAGR set, no matter what happened or will happen.

Mostly, the contribution of each strategy to the total outcome will look something like the chart below where each color represents a strategy's percent contribution to the total.

As can be observed, the best performers take most of the action and are responsible for most of the outcome. It also stresses the point that the best strategies of the lot should be weighted more heavily than the others since that would tend to increase overall performance to even higher levels. It also makes the point for highly concentrated portfolios of strategies. The same as with the factors in each strategy. In fact, you could use the same chart structure to analyze factor contributions to a strategy, the chart would be similar to the above. The first few factors would rule the game.

Going forward, you can, every day, order the strategies by their respective performance level. After a short while, you will be able to observe which are leading and which are lagging in performance and change their weighting accordingly. The leading strategies change, then you emphasize the new leaders by increasing their respective weights.

As a side note, I read somewhere that Cohen last year made about 1% return on his total portfolio. I understand that the 250M is just a tiny fraction of the 11B under management. But still, I could not associate that with high performance. It could also explain why those having had an allocation got about 0.11% of what was under management in compensation (so far some $270,000 in total for the group).

Your objective is to design a strategy like the blue one at the bottom in the above chart and not the green one at the top. But, you might not get there by doing the same thing as everybody else. Note that the green one at the top still had a positive return.

The most serious advantage you can have in this game is that you can change your stock allocation at will. Especially going forward and this based on whatever criteria you might consider worthwhile. Technically, the game is in your hands where you can put the emphasis where you think will most benefit the overall outcome of your portfolio that it has a single or a group of strategies.

Preferential weighting of trailing winners may not work (in fact, due to mean reversion, it could fail miserably). For example, in a long-only approach, one could pick the top-performing stocks out of each of the industry sectors (e.g. over the last year), or buy sector ETFs and combine them (e.g. by market cap). Or one could try to pick the best-performing sector ETF, and put all capital into it. If you are saying chase the winners, good luck. If it were that simple, we'd all have yachts, airplanes, and sports car collections.

I collected and published 40 factors above, and here's another one I found recently:

class AverageMonthlyArticleSentiment(CustomFactor):  
    # Economic hypothesis: Article sentiment can reflect the  
    # public's mood about a given security. In this case, use the past  
    # 30 day's article sentiment to make decisions for the next 20 trading  
    # days  
        inputs = [sentdex.sentiment_signal]  
        window_length = 30  
        window_safe = True

        def compute(self, today, assets, out, sentiment_signal):  
            out[:] = preprocess(np.nanmean(sentiment_signal, axis=0))  

You can find others in the public domain, or use factors you've devised, and show specifically how you would do the weighting, or at least provide an outline of what would be a successful way of combining them, keeping the risk of over-fitting in check. I think you are saying just pick 5 of the "best-performing" factors (presumably returns-based ranking) and discard the rest, and periodically re-rank based on performance, and pick a new set of 5 factors. The problem I see is that one could just end up chasing returns (in analogy with picking the best long-performing stocks or sector ETFs or whatever). And if the pool of factors has a lot of similarity in it, then the 5 factors could all be drawn from essentially the same underlying factor.

It goes without saying that "the best of your stock selection matters a lot" but without specifics on how the selection is to be done, there is no helpful guidance.

Conceptually, clustering factors by a similarity metric (in analogy to grouping stocks by industry sector) could have potential, in my mind. You seem to argue that it is a fundamentally bad idea, which is fine, but I'm still at a loss to understand your perspective.

@Grant, the real question is: What has sustainable value in our “short-term” trade decision making?

We could choose any combination of stocks from its humongous tradable universe based on whatever criteria. But, we have this little mathematical problem, we want the stocks we pick to really outperform the market averages in some manner. That it be by outperforming over the long term or by trading on all its price swings for whatever reasons we might devise using anything we find of significance.

The decision making is different when considering short-term price moves. Short-term, there is a lot of randomness in the making of price movements. It needs to be accounted for and dealt with. I consider the degree of randomness in price movements plays a major role in the mechanics of the short-term trade.

Like, I accept a high degree of randomness in stock prices (even 95%+). Evidently, it has its set of implications. For one, I might not know what the price of any stock might be tomorrow (even if the price might be biased upward over the long term). This, whatever criteria I might want to devise or adhere to from some other source.

I opted to just make the bet based on an “excuse” of a parameter that I identify as possibly a medium to long-term drift. I don't even ask it to be precise, just to have been there in the past. It is after taking the bet that I will manage and game the trade.

Say stock prices are “totally” random, or almost like it. Then, the wisdom of crowds, machine learning, deep learning, artificial intelligence, indicators, parameters, factors, residuals, principal component analysis, wavelets, multiple regressions, quadratic functions, and on and on, cannot help you decipher a game approaching a heads or tails type of game where upcoming odds and biases even change all the time.

The pure randomness of it all precludes finding anything significant in [sentdex.sentiment_signal] for instance, or in [stocktwits] for that matter. Oh, you could always find something on some past, smoothed, interpreted, tailored and doctored sentiment data. But, going forward that is not the data that will be presented. You will have a new ball game. What was there at some time might not be there the next and there is no way to figure out in advance which is which.

No one on this planet, using whatever tools they had at their disposal, ever beat the game of heads or tails except by pure luck of the draw. Sometimes you win, sometimes you lose, but it does not change your average expectation which remains zero for every long-term game you take. No expected gain, but also no expected loss.

However, the nature of the game changes if you add some long-term drift. It makes it an upward biased game where you can win all the time just by sitting around for a long time.

So, short-term, how should you trade? What kind of decision process will fit your temperament, your style and your understanding of the game?

Say, you can identify the best cluster of similarly behaving stocks. These would be the highest performers in risk-return space. Shouldn't most of the efforts be concentrated on trading methods that would enhance their respective performance even more?

Note that by changing the set of factors, your clusters will be moving around in risk-return space. And, also, clusters do move around in time relative to others, disassemble, rearrange, even morph as if in other sectors. All things I consider, and more, in the design of a trading strategy.

@ Thomas W. -

Any thoughts on the platform limitations I've discussed above? What sort of feedback are you getting from the Quantopian Enterprise side of the business?

One thought would be to support backtesting in the research platform in a more flexible way. The browser-based backtester is really geared toward Quantopian Community, and the contest/fund. I'd think something more broadly useful to both Q Community and Q Enterprise would be better, integrated into the research platform (e.g. as a module callable within a notebook). Yes?

All I can say is that we are well aware of these limitations with a strong desire to fix them. Thanks for your patience.

Thanks Thomas -

I recall that one used to be able to run zipline in a notebook. Was that functionality removed? If so, any plans to bring it back, in some form?

FYI -

https://www.quantopian.com/posts/zipline-dot-tradingalgorithm-removed-from-research-and-backtesting

It is not possible to run zipline in a notebook, and presumably there are no plans to re-enable this capability.

few cents on the discussion b/w Grant and Guy. Before that, a big disclaimer that both G&G are way smarter than me, and hats off to G No.1 for the initiative and the codes...

re: is say 40 factors in a single strategy too much? probably not. The main benefit for having lots of (uncorrelated) factors is risk reduction in your strategy. Surely some factor will have IR less than half of the strong factors, but if you throw them in a mean-variance op, they will still get some weights due to uncorrelation. Yes, "40 different strategies with (same?) 40 factors" might be useless, but what Grant is doing is 1 single strategy with 40 factors. It (might) at least max the IR in this single strategy by having a diverse range of uncorrelated factors. Also, it can actually reduce turnover too. But point taken from Guy's point that if 5 factors explain 98% of the return, there is no point to add more factors. To address that concern, (and again, this is not what Grant's initiative was about), one can set a hard cut off such that if a factor X has less than Y% weight in the final mix, then just throw it away and re-normalize the rest of the useful factors so that sum=1. With the computing power nowadays, (no punt intended for the computing limit we are facing in this backtest) there is no harm throwing all your factors (that make sense) into the mix, and let the standard optimizer pick how many factors it likes.

re: best way to assign factor weight? (again, lets remember that Grant is just doing this for academic purpose and for the purpose of letting us to steal his codes, he is not trying to optimize the weights). the most traditional and still the most acceptable method is max your total expected returns/IC from factors while min the return vol. So "clustering" is by no mean the best method. In fact, lets remember, most machine learning approaches are just ways for lazy ppl (not Grant, he did the opposite of being lazy here) to come up with a blackbox function, instead of defining the function using pen and paper. In this very case, one probably don't need ML to define factor groupings, all you need to do is to group them according to value style, momentum style etc. Having said that, can this clustering approach beat a more static style grouping in certain period? You never know!? Say during a particular period where momentum becomes highly correlated with value, a traditional static style grouping won't do anything to reduce the risk, but this clustering will automatically reduce the exposure to these factors.

re: some confusion. I think what Guy was implying is a factor momentum approach to up/down weight. It is a whole different topic of discussion. And depends on which schools of quant you graduated from, some love it and some hate it. And it is unlikely there will be any consensus on this....

lastly just on clustering as a method. No ML expert here, but with 10,000 features/dimension for 20 data points, i am concerned that the clustering becomes a random pick. Think the IC approach (IC = rank correlation of one factor's scores vs another factor's) makes more sense. If one still wants to integrate ML into this, can try IC approach first (so that it becomes 20 features for each factor, instead of 10000 features, then can cluster it)

Hey Grant, thanks for sharing your code! I've learned a lot from it.

Do you think you could explain what is happening in the preprocessing function? I noticed its called in most of your factors but I can't really figure out what exactly it does with my minimal knowledge of python/pandas.

def preprocess(a):

a = a.astype(np.float64)  
a[np.isinf(a)] = np.nan  
a = np.nan_to_num(a - np.nanmean(a))  
a = winsorize(a, limits=[WIN_LIMIT,WIN_LIMIT])  

return preprocessing.scale(a)

from what i can tell you start by dealing with NaNs? then winsorizing then scaling? what does scale do? is that similar to zscoring? is there a reason you don't zscore? why is the win limit zero? shouldn't you have limits so that it gets rid of the outliers?

also could someone explain this snippet of code in your make pipeline?

for k,f in enumerate(factors):
for days_ago in range(N_FACTOR_WINDOW):
pipeline_columns['alpha_'+str(k)+'_'+str(days_ago)] = Factor_N_Days_Ago([f(mask=universe)], window_length=days_ago+1, mask=universe)

@ Alderik -

def preprocess(a):

# convert to np.float64  
a = a.astype(np.float64)  
# replace +/-inf with np.nan  
a[np.isinf(a)] = np.nan  
# subtract the mean (ignoring nans in the mean computation) and then convert remaining nans to zero  
a = np.nan_to_num(a - np.nanmean(a))  
# winsorize to eliminate extreme outliers (set limits to zero to allow extreme values)  
a = winsorize(a, limits=[WIN_LIMIT,WIN_LIMIT])  

# convert to standard z-score  
return preprocessing.scale(a)  

The code below is iterating over the factors, f, and assigning a integer, k, to each of them. Then for every day in the look-back window it is creating a Pipeline column with the unique label alpha(k)(days_ago).

for k,f in enumerate(factors):  
for days_ago in range(N_FACTOR_WINDOW):  
pipeline_columns['alpha_'+str(k)+'_'+str(days_ago)] = Factor_N_Days_Ago([f(mask=universe)], window_length=days_ago+1, mask=universe)  

@Grant -

Thanks for the quick response!

Could you provide more of a rationale to why you do those steps for preprocessing?

like why does it need to be a float? why subtract the mean?
Why winsorize and allow extreme values?

Why do you need a column with alpha k day ago?

@ Zicai -

My sense is that there are a lot more degrees of freedom than one might expect. For example, every company can have unique events that drive its stock price, each industry sector can be subject to its own influences, etc. The signals are there, but just buried under a lot of noise. Perhaps it is survivorship bias, but I suspect there is something to Ed Thorp ("In May 1998, Thorp reported that his personal investments yielded an annualized 20 percent rate of return averaged over 28.5 years."), Warren Buffet and Charlie Munger, and Ray Dalio, to name a few successful folks in this domain. There is such a thing as skill, I would bet.

@ Alderik -

def preprocess(a):

# convert to np.float64 (I recall encountering a bug, due to mixed data types, and this fixed the bug)  
a = a.astype(np.float64)  
# replace +/-inf with np.nan (simplifies code, by treating infs and nans the same)  
a[np.isinf(a)] = np.nan  
# subtract the mean (ignoring nans in the mean computation) and then convert remaining nans to zero  
# demeans alpha vector and shifts all infs and nans to zero  
a = np.nan_to_num(a - np.nanmean(a))  
# winsorize to eliminate extreme outliers (set limits to zero to allow extreme values)  
# see https://en.wikipedia.org/wiki/Winsorizing  
a = winsorize(a, limits=[WIN_LIMIT,WIN_LIMIT])  

# convert to standard z-score  
# normalizes alpha vector so that all alpha vectors are on same scale and can be compared  
return preprocessing.scale(a)  

Note that the first subtraction of the mean is really to manage the infs and the nans. By first de-meaning, then the infs and nans can be set to zero alpha value. Basically, in the alpha combination, the infs and nans end up carrying no weight, which would seem appropriate. It is kind of a hack, though--probably not the best practice.

You can play around with different levels of the winsorize limits. It is easier to see the effect in alphalens. In the code above, I think I ended up not applying winsorize (by setting the limits to zero) just because I didn't want to mess with it. I suspect a little bit of distribution clipping could help.

The trailing alpha values are needed for the clustering routine, which is part of the general alpha combination step (see https://www.quantopian.com/posts/a-professional-quant-equity-workflow). Frankly, I recommend focusing your effort on finding good alpha factors, and not getting too caught up in fancy alpha combination. In my opinion, Q should just take in individual alpha factors from authors anyway, versus incentivizing them to write full-up algos, but what do I know. This way, they can take full advantage of their new "signal combination" paradigm--it just makes sense to have access to all of the raw signals if they are to make the most of combining them. And presumably they have no shortage of computational horsepower to do a proper job in the alpha combination step of the workflow.