alphalens - what does it do?

Not sure what you mean by "under-the-hood". If you're asking about implementation details, I suggest you just read the code here. It's surprisingly short and very educational.

If you have other kind of questions, ask away; I've used it a bit.

Thanks João -

I gather that Alphalens takes as an input a factor that takes a single numeric value per stock every day, and also prices per stock every day, and then attempts to say something about the predictive power of the factor.

Looking at https://github.com/quantopian/alphalens, the term "returns" is used, but how are the portfolio returns computed? How does alphalens determine the portfolio weights versus time from the factors? Or is it equal-weight? For alphalens to work, does the factor need to be formulated in a certain way (e.g. factor values predict returns with a linear model)? Does the factor need to be demeaned, and range between -1 and +1? And normalized, to gross leverage of 1.0? Or does alphalens take care of this (overriding the factor, which at any point in time may call for all long or all short or some mix)?

On https://blog.quantopian.com/a-professional-quant-equity-workflow/, it says:

An alpha is an expression, applied to the cross-section of your
universe of stocks, which returns a vector of real numbers where these
values are predictive of the relative magnitude of future returns.

Is alphalens intended to help sort out a model (possibly nonlinear) that relates the factor values to future returns? Or is there an assumption that the factor effectively linearizes the relationship, so that returns are proportional to factor values?

From what I understand it works as follows. You pass alphalens two things: one is the time-series of the factor, the other is "the portfolio". The way I've seen it used this portfolio is passed to alphalens as a DataFrame, the columns of which are just the stocks prices. The relevant function here is utils.get_clean_factor_and_forward_returns. Nothing better than reading the documentation of the function here. Pay special attention to the description of the variables factor and prices. So I guess that to answer one of your questions, Yes, it's just equal-weighted stocks.

factor_data = get_clean_factor_and_forward_returns(factor, prices)

Then that function returns a DataFrame with a shape that alphalens likes, lets call it factor_data. Your job as you study a factor is to then pass factor_data into different alphalens functions. For example to compute your factor's alpha and beta you can take that factor_data and call performance.factor_alpha_beta(factor_data). See code here.

print factor_alpha_beta(factor_data)

Regarding factor returns, there's a function for just that: performance.factor_returns, which you can see here. I've never used it myself, but from what I understand it computes returns for the stocks as if they were weighted by the factor. I imagine that in this case it's important to have the factor ordered so that higher values of the factor are the "good" ones, and lower (negative) values of the factor are the bad ones. According to factor_returns docs, you don't need to demean the factor or anything, that function does it all.

In addition to this you can do other nice things on top, like segment your set of stocks by quantile, or by some grouper function which you define.

Is alphalens intended to help sort out a model (possibly nonlinear) that relates the factor values to future returns? Or is there an assumption that the factor effectively linearizes the relationship, so that returns are proportional to factor values?

At core the predictiveness of the factor is given by some linear regression: in the function factor_alpha_beta you can find this code:

        reg_fit = OLS(y, x).fit()  
        alpha, beta = reg_fit.params

where OLS stands for "Ordinary Least Squares".

Let me know if you have more questions.

Thanks João -

I'll have to spend some more time with this, when I get the chance.

Regarding this:

it computes returns for the stocks as if they were weighted by the factor

I think this implies that factors should be written as portfolio weights? Or is it simply good enough that return be a monotonic function of the factor (which would be fixed by ranking?)? I guess before I get into alphalens, I need to understand the requirements for an alpha factor, in the context of Quantopian...

I think this implies that factors should be written as portfolio weights?

I'm not sure what this means. What I meant by the phrase you quote is this bit of code in function factor_returns:

    weights = factor_data.groupby(grouper)['factor'] \  
        .apply(to_weights, long_short)

That is, if you have a bunch of stocks and their daily prices on the one hand, and you have a "factor" (which is a number of each day and each stock) on the other hand, then you can define a "portfolio return" which is the return of a portfolio made of those stocks where you're long stocks with positive factor and short stocks with negative factor.

EDIT: If I remember correctly the Sentdex tutorial talks about alphalens at some point. https://www.quantopian.com/tutorials/algorithmic-trading-sentdex

Presumably, alphalens takes the raw factor values, demeans and normalizes and then uses these as the portfolio weights, correct? It doesn't rank and then demean and normalize (which I recall seeing in some example algos).

The function get_clean_factor_and_forward_returns also ranks. There's a quantiles parameter (default 5, but you can change) which requires ranking and then binning the stocks according to the ranking quantiles.

Hmm? I guess I need to understand that detail. If my algo uses raw factor values (not ranked and not demeaned, passed to the Optimize API), but alphalens is working with ranked, demeaned factors, then it might not do a good job of predicting algo performance.

At the other end of the process, I'm wondering if all of the output from alphalens can be rolled up into one or a few simple figures of merit. Say I had lots of factors to evaluate, and wanted to score them without looking at each one individually using alphalens. For example, how would I automatically analyze the 101 Alphas (see https://www.quantopian.com/posts/alpha-compiler)?

Luca

@Grant, probably alphalens.tears.create_summary_tear_sheet is enough to understand the quality of your factor, so you might use that with the 101 Alphas and then call alphalens.tears.create_full_tear_sheet only on those factors that show good performance.

If my algo uses raw factor values (not ranked and not demeaned, passed
to the Optimize API), but alphalens is working with ranked, demeaned
factors, then it might not do a good job of predicting algo
performance.

The forward returns demeaning is performed only if long_short=True and it is useful to adjust the factor performance for a dollar neutral algo, but it doesn't mean you have to modify your factor values before passing it to the Optimize API. You also don't have to perform any ranking.

Briefly this is what Alphalens does: if you know the factor value for each stock and also their future price you actually know what are the expected returns for each factor value. So Alphalens compute the mean future (forward) return for the factor values, but it does so by dividing the factor in quantiles first, then averaging the future return of each quantile. Those are the values you see pretty much everywhere in the factor returns analysis, except for the plot called "'Factor Weighted Long/Short Portfolio Cumulative Return", that is the plot you and João Aparício talked about. In that articular plot the simulated portfolio is non based on quantiles but each stock forward return is weighted by the stock factor value and the weight is like this:
factor_demeaned_vals / factor_demeaned_vals.abs().sum()

In your algorithm you can actually use the same method to compute the weighting for your stocks (or zscore that gives you a similar weighting and it is already implemented as pipeline factor's method) but you can also use equal weighting. You can decide from the Alphalens output what is the best, just compare "'Factor Weighted Long/Short Portfolio Cumulative Return" plot to ""Cumulative Return by Quantile"

It is actually worth looking at the code as it is really linear and you can follow very easily what alphalens does.

This video is also very interesting (from minute 28)

Thanks Luca -

So I gather that aside from the Spearman rank correlation monotonicity test, the factor is raw, demeaned and normalized, but not ranked? Or does alphalens manage outliers by ranking?

I'll look at the code at some point, but generally I don't like attempting to read code (and having to unravel unfamiliar Python), when words and equations would be better. I was hoping for a write-up, but I guess none exists.

Luca