Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
multi-factor Alphalens example

Here's a notebook I've been working on. Questions, comments, improvements welcome.

I'd like to add more factors, so please provide suggestions, if you are willing.

References and notes:

  1. Just watched and recommend Factor Modeling on YouTube (while exercising, which I also recommend and so does your doctor and mother). It's a decent, qualitative high-level introduction to factors in the context of Quantopian.
  2. Aside from Alphalens, Delaney posted a supplementary tool here: https://www.quantopian.com/posts/checking-correlation-and-risk-exposure-of-alpha-factors. I don't think it as been formerly released, as Alphalens is; nevertheless, some folks might find it useful.
  3. I started an exploration of the means to do alpha combination in before_trading_start (see https://www.quantopian.com/posts/alpha-factor-combination-in-pipeline-how-to-fancify-it). Once I settle on N >= 10 factors, I think this is my next task, so that I have an architecture for doing more sophisticated alpha combination.
  4. Discussion of fundamental factors: https://www.quantopian.com/posts/relevant-fundamental-factors.
  5. Discussion of "strategic intent" requirement: https://www.quantopian.com/posts/update-to-requirements-to-get-funded-strategic-intent.
  6. The 101 Aphas Project: https://www.quantopian.com/posts/the-101-alphas-project.
  7. Discussion of the risk-free rate and its relevance to Q: https://www.quantopian.com/posts/risk-free-rate-on-quantopian. It would seem to have bearing on the minimum return required for an "all-weather" algo. Presumably a kind of forward-looking risk-free rate is implicit in evaluating algos for the Q fund.
  8. Q post on a fundamental factor and literature reference: https://www.quantopian.com/posts/capital-expenditure-volatility-capex-vol-template-fundamental-algo.
  9. https://www.quantopian.com/posts/free-cash-flow-to-enterprise-value-template-fundamental-algo
  10. https://www.quantopian.com/posts/debt-to-total-assets-template-fundamental-algo
  11. https://www.quantopian.com/posts/quantopians-fundamental-factor-library
  12. Peter Harrington blog, including some references: http://alphacompiler.com/blog/
  13. Piotroski's F-Score algo example: https://www.quantopian.com/posts/piotroskis-f-score-algorithm
  14. On-Balance Volume technical indicator: https://www.investopedia.com/terms/o/onbalancevolume.asp
  15. Alphalens tutorial: https://www.quantopian.com/tutorials/alphalens
58 responses

hey grant, a very well put together notebook. I've been toying around with two separate algos using multiple factors (25+) trading between 500 or 1,000 L/S daily . However, the roadblock for me right now is being able to utilize alpha lens in assessing them and understanding the charts. If you'd like to collaborate on either (or both) let's take a look.

Thanks Daniel -

I had been posting to a couple related threads that Delaney had been managing:

https://www.quantopian.com/posts/alphalens-questions-thread
https://www.quantopian.com/posts/checking-correlation-and-risk-exposure-of-alpha-factors

I had been participating in these threads (and others), but recently I was asked by Q support not to post to other threads not initiated by me for awhile, since presumably I have been violating their terms of use in some way not so clear to me. So, I started this thread.

I've been holding off doing back testing (although it is very tempting), in the hope that I could get a more fundamental handle on things. I'm thinking there should be some definite rules-of-thumb go/no-go criteria that one could establish regarding individual factors and their combination.

One thing I'm wondering about is the mean IC relative to its standard deviation. It would seem that a cut could be made at this level (e.g. the ratio of the mean IC to its standard deviation should be at least 1.0). No matter how large the IC is, if its standard deviation is even larger, then it would seem to suggest that there will be a lot of volatility (or at least periods of decent returns, interspersed with periods of flat returns).

Nice performance.

Do you have any theories on why there is a big spike on the "Misc" market sector?

No. I noticed that. You could post a question to Delaney's thread (I can't, at least until November 1). I let Delaney know about this thread, so perhaps he'll pick up questions here, too.

On https://www.quantopian.com/papers/risk there is no "Misc" market sector included in the risk model, so there would seem to be a disconnect between what (I think) is a standard Alphalens configuration, and the risk factors. Hmm? Seems like they should be 1:1.

Grant,
The MIsc asset is PHYS - physical gold...and it's not very active...281 rows out of ~78k rows of results...go figure!
I don't even know where that comes from.
alan

Here's a trial backtest, with the factors from Alphalens. Note that I am not using the risk model constraint in the optimize API, and slippage and trading costs are disabled.

Hi Grant,

I noticed you used peg_ratio as one of your factors. I discussed an issue with that factor here Morningstar only started provided data on peg_ratio sometime in 2014 and you are backtesting starting on 2007. While your backtest looks great frictionless and may confirm what your alphalens analysis is telling you, I worry that it may backfire going forward. I haven't checked on the start/availability dates of growth_score and psychsignal/ stocktwits data but I suspect the same issue.

Thanks James -

I'll have to look into it. If you have any other suggestions, just let me know.

Regarding your post on https://www.quantopian.com/posts/tearsheet-feedback-thread, I'm wondering it the 3% per year return is really enough. Under normal circumstances, this would be largely consumed by the risk-free rate, and the algo would be silliness. As soon as the risk-free rate goes up, then I would think your algo would be worthless, unless the return goes up (e.g. to 6%).

Yes, Grant for the most part I agree with you that as the risk free rates go up, this low returns, low volatility strategy that will be levered up 3-8 times will fall apart. I have previously argued vigorously about the logic of this strategy and the non inclusion of the risk free rate in their computation of the Sharpe ratio. The strategy though is the trading style of Steve Cohen and Point72, the investor of the fund and so far, it has worked for them in a low interest and highly bullish market conditions.

In a portfolio of uncorrellated strategies, it may still have value.

@ Alan - I'm confused. I wouldn't think that Sprott Physical Gold Trust (PHYS) would be included in the QTradableStocksUS? Are you saying that it is showing up in my Alphalens analysis?

@ James - Thanks for the tip on the peg_ratio. I see now that if I try to run my Alphalens tear sheet or the algo on the peg_ratio factor only starting in 2007, there is a problem. Looking at https://www.quantopian.com/help/fundamentals#valuation-ratios, there is no note stating that it starts later than other fundamentals, and so I agree that this is a potential pitfall. There's probably some way to add a guard for this sort of thing in my code. The other consideration is if I should always limit analysis to the "youngest" data set, or just let the factors fold in as they become available?

Regarding your comments on the potential risk to the Q fund of factors unknowingly kicking in (and potentially out) without anyone knowing, my approach has been that I'll just let Q review my code, if it turns out that their interest is piqued by its "exhaust." My hope would be that they'd catch problems like this (and the "strategic intent" requirement should be easier to meet, by referring to the code directly).

Also, I'm thinking that for a truly all-weather, long-term long-short strategy, there is some minimum return that gets one beyond the anomalously low risk-free rate regime of recent times. Your posted algo tear sheet has something like SR ~ 1.0 (no risk-free rate included), and returns of ~ 3% per year. If I had money burning a whole in my pocket, I'm not sure I would fund you; I'd shop around for an alternative at the same or better return at the same level of risk. You might see if you can get an answer out of Jess, since there must be some minimum return needed. Years ago, when they kicked off the fund concept, I recall a number like ~7% being thrown out, which sounds more realistic.

@ Joakim - A portfolio of lots of uncorrelated strategies that each return ~0% will return ~0%. Maybe Q is selecting algos partly based on diversification of the Q fund, but there needs to be some level of return (above the anticipated risk-free rate going forward, not backward).

If I add the risk model constraint to my backtest above, I get an error:

Something went wrong. Sorry for the inconvenience. Try using the built-in debugger to analyze your code. If you would like help, send us an email.
ValueError: NaN or Inf values provided to FactorExposure for argument 'loadings'.
Rows/Columns with NaNs:
row=Equity(32620 [PFF]) col='momentum'
row=Equity(32620 [PFF]) col='short_term_reversal'
row=Equity(32620 [PFF]) col='size'
row=Equity(32620 [PFF]) col='value'
row=Equity(32620 [PFF]) col='volatility'
Rows/Columns with Infs:
None
There was a runtime error on line 163.

This would appear to be a data integrity issue, since the back test starts fine, and runs for ~ 6 months and then crashes. Is there a standard bad data guard I need to add to use the risk model constraint?

I also sent a request in to Q support.

Grant,

Try adding dropna() in risk model constraint like this:

context.risk_loading_pipeline.dropna(),

See if it will work

Thank James -

This works:

risk_model_exposure = opt.experimental.RiskModelExposure(  
        context.risk_loading_pipeline.dropna(),  
        version=opt.Newest,  
    )  

However, there is great irony in needing to hack risk management code with a dropna(). Kinda like putting duct tape on a life jacket, hoping it holds. It is still imported like this:

from quantopian.pipeline.experimental import risk_loading_pipeline  

So I guess we're still experimenting.

hey grant, i've witnessed the same issue with one of my factors for the same equity.

@Grant-- Yes PHYS is showing up in your pipeline output of your research pipeline...somehow it's intermittently in QTU...I agree shouldn't be...

I think your algo looks good!...amazing what you can do with seemingly simple stuff and fundamentals !

I included some minor changes to your Backtest code to get it "contest " ready:

-Only used last two years, as that is what contest sees.
-Commented out the slippage and commission...my understanding is that Q then applies their defaults.
-Added a snippet from others to make sure that any positions not in current QTU get liquidated.
-Added risk constraints to meet contest criteria...probably too many.

I played with removing factors from the combined factor list, yet nothing was better than you had.
One "problem" is the ranges of your component factors...the way you have it, they may vary widely, so I tried a zscore on each pipeline factor, yet that didn't work, so I left it commented out. I can only assume that you have adjusted the weights of the factors properly inside the factors, as anytime I change something, I get worse results.

All for now...keep up the good work!
alan

Thanks Alan -

I'm curious about the need for your "QTU adjustment" code. I suppose it won't do any harm, but is it necessary when using the MaximizeAlpha constraint? I was aware of an issue with the TargetWeights constraint (which presumably Q has on an internal list to eventually fix?), and a work-around was published, but was unaware that we need to patch the optimize API in this case?

On a similar note, you explicitly set the risk model constraint limits in the code, but I thought that using version=opt.Newest would be in sync with the latest contest/fund limits? Also, you use a minimum limit for the momentum of -0.2, but a maximum of 0.3 for momentum. Intentional?

Regarding PHYS in the QTU, I suppose I should add code to remove it? If anyone knows how to do this in Pipeline, please let me know. Also, I'm curious why it is slipping through the cracks, based on the QTU specification. I'll have to sort out how to spit out its Morningstar attributes.

My goal is to add three more factors, for a total of 10. So if you have suggestions you are willing to share, please provide them.

    ####------ Start: QTU adjustment -----------------  
    # Prevents low-leverage, due to non-execution of orders, which are cancelled at the end-of-day  
    # Sell any positions in assets that are no longer in QTU list.  
    for security in context.portfolio.positions:  
        if data.can_trade(security):  # Work around inability to close de-listed stocks for QTU.  
            if security not in context.stocks:  
                to_close = [security]   # securities to be changed  
                log.info("ClosingOut:{}".format(to_close))  
                ids = order_optimal_portfolio(  
                    objective   = opt.TargetWeights(  
                    pd.Series(0, index = to_close)  # the 0 means close them  
                ),  
                constraints = [  
                    opt.Frozen(  
                        set(context.portfolio.positions.keys()) - set(to_close)  
                    )  
                ]  
                )  
    ####------ End: QTU adjustment -----------------  
    constraints.append(  
       opt.experimental.RiskModelExposure(  
           context.risk_loading_pipeline,  
           version=opt.Newest,  
          min_momentum=-0.2,  
          max_momentum=0.3,  
          min_size=-0.3,  
          max_size=0.3,  
          min_value=-0.3,  
          max_value=0.3,  
          min_short_term_reversal=-0.2,  
          max_short_term_reversal=0.2,  
          min_volatility=-0.2,  
          max_volatility=0.2,  
       ))  

Added a few factors (up to 9 now!). Just kinda dinking around, but looks like this could be entered into the contest with a few clicks. Go for it!

Hi Grant, not sure if you're still looking for 1 more factor but here are three which may fit into what you have already.

class Revenue(CustomFactor):
inputs = [Fundamentals.total_revenue]
window_length = 252
def compute(self, today, assets, out, revenue):
out[:] = revenue[-1] > revenue[0]

class GrossMarginChange(CustomFactor):
window_length = 126
inputs = [Fundamentals.gross_margin]
def compute(self, today, assets, out, gross_margin):
out[:] = (gross_margin[-1] > gross_margin[0]).astype(int)

class Gross_Income_Margin(CustomFactor):
#Gross Income Margin:
#Gross Profit divided by Net Sales
#Notes:
#High value suggests that the company is generating large profits
inputs = [Fundamentals.gross_profit, Fundamentals.total_revenue]
window_length = 1
def compute(self, today, assets, out, gp, sales):
out[:] = (gp[-1] * 4) / (sales[-1] * 4)

class MaxGap(CustomFactor): # the biggest absolute overnight gap in the previous 90 sessions
inputs = [USEquityPricing.close] ; window_length = 90
def compute(self, today, assets, out, close):
abs_log_rets = np.abs(np.diff(np.log(close),axis=0))
max_gap = np.max(abs_log_rets, axis=0)
out[:] = max_gap

Thanks Daniel -

I'll give them a try when I get the chance. Trying to get >= 10 factors that give some sort of decent SR starting in 2010 to present.

@Daniel,

For your first two CustomFactors (Revenue and GrossMarginChange) do they return a 'factor' or a 'filter'? As you're using a comparison operator, wouldn't they return a boolean (True or False), so a filter rather than a factor? Though for the second one, it looks like you're turning the boolean into an integer (0 or 1 I guess)?

I've attached a notebook with a simple 'Rate of Change' CustomFactor, which you can pass through other financial statement / fundamental values at whichever window_length you specify. I'm not expert at this though so can't really vouch for if this is the right way of doing this... Perhaps one should use the 'as_of_date' when looking back?

@Anyone,

In order to 'test' and check the output of a CustomFactor, do I need to reference it from Pipeline, run Pipeline on one day, then print (or head/tail) the output of the pipeline? Or is there an easier/quicker way of viewing the output of a CustomFactor, or a Fundamental data field for that matter?

Here's a quickie update, with most of Daniel's factors added (Gross_Income_Margin is commented out, since it causes a crash...still investigating). The algo now fails the beta-to-SPY constraint, so some work is needed there.

Here's just the 'Gross_Income_margin' - sometimes it results in an alpha vector of zeros. Not sure what's going on. If anyone knows how to interrogate these things in a research notebook, I'd be interested. Or maybe there are certain guards required for fundamentals that I'm not familiar with?

The factor (including the global preprocessing function) is:

class Gross_Income_Margin(CustomFactor):  
            #Gross Income Margin:  
            #Gross Profit divided by Net Sales  
            #Notes:  
            #High value suggests that the company is generating large profits  
            inputs = [Fundamentals.gross_profit, Fundamentals.total_revenue]  
            window_length = 1  
            def compute(self, today, assets, out, gp, sales):  
                out[:] = preprocess(gp[-1]/sales[-1])  
def preprocess(a):  
    a = np.nan_to_num(a - np.nanmean(a))  
    a = winsorize(a, limits=[WIN_LIMIT,WIN_LIMIT])  
    return preprocessing.scale(a)  

From https://www.quantopian.com/help/fundamentals :

gross_profit

Total revenue less cost of revenue. The number is as reported by the company on the income statement; however, the number will be calculated if it is not reported. This field is null if the cost of revenue is not given. Gross Profit = Total Revenue – Cost of Revenue.

total_revenue

All revenues, sales and income that the company deems as a total sum of all of their income as reported in the company’s income statement. Bank: Total Revenue = Net Interest Income + Non-Interest Income.

cost_of_revenue

The aggregate cost of goods produced and sold and services rendered during the reporting period. It excludes all operating expenses such as depreciation, depletion, amortization, and SG&A. For the must have cost industry, if the number is not reported by the company, it will be calculated based on accounting equation. Cost of Revenue = Revenue – Operating Expenses – Operating Profit.

@ Joakim, yes you are correct the first two custom factors would be filters since its returning True or False / 0 or 1.

Like Grant, I am trying to narrow down to a list of fundamentals (10-20). However, it's been a real struggle testing individual factors and then combining them all without understanding how the Alpha Lens works, especially how to interpret the results.

Once I have a sold foundation of fundamental custom factors I would like to try and see if I can implement Technical Indicators as custom factors as well.

Not that I don't think as they are used, they are Pipeline Filters:

class Revenue(CustomFactor):  
inputs = [Fundamentals.total_revenue]  
window_length = 252  
def compute(self, today, assets, out, revenue):  
out[:] = (revenue[-1] > revenue[0]).astype(int)

class GrossMarginChange(CustomFactor):  
window_length = 126  
inputs = [Fundamentals.gross_margin]  
def compute(self, today, assets, out, gross_margin):  
out[:] = (gross_margin[-1] > gross_margin[0]).astype(int)  

We just have a comparison of vector values with a Python logical operator. I don't think we are creating a Pipeline Filter, as described on the help page (although depending on how things are set up, it might be that any vector of 0s and 1s (or boolean True/False) would be recognized as a Filter, under the duck typing paradigm).

See https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.greater.html#numpy.greater .

This formulation of the Gross_Income_Margin works:

        class Gross_Income_Margin(CustomFactor):  
            #Gross Income Margin:  
            #Gross Profit divided by Net Sales  
            #Notes:  
            #High value suggests that the company is generating large profits  
            inputs = [Fundamentals.cost_of_revenue, Fundamentals.total_revenue]  
            window_length = 1  
            def compute(self, today, assets, out, cost_of_revenue, sales):  
                gross_income_margin = sales[-1]/sales[-1] - cost_of_revenue[-1]/sales[-1]  
                out[:] = preprocess(gross_income_margin)  

This way, one avoids infinite values when sales are zero, but the cost of revenue is not. Zero sales yields a NaN, which gets converted to a demeaned value of zero in the preprocess function.

Here's an updated backtest, with all of the candidate factors working (as best I can tell). It fails the beta-to-SPY constraint, but otherwise it seems to be headed in the right direction. Kinda surprising that with such low position concentration and the dollar neutral optimize API constraint that beta-to-SPY would be a problem.

I'm reluctant to apply the beta constraint in the optimize API, since prior experience suggested that it was not particularly effective, and might just be creating churn.

If anyone has insight on the beta-to-SPY issue, please share...

Note that I bumped up the turnover to 0.15, just for yucks. It is one of the "free parameters" and can be explored more systematically at some point.

Here's an updated Alphalens sheet, with all of the factors. If anyone knows how to read the tea leaves, please share your feedback.

Note that factors can be turned on/off by simply commenting out undesirable ones in this Python dict structure:

return {  
            'MessageSum':              MessageSum,  
            'FCF':                     fcf,  
            'Direction':               Direction,  
            'mean_rev':                mean_rev,  
            'volatility':              volatility,  
            'GrowthScore':             growthscore,  
            'PegRatio':                peg_ratio,  
            'MoneyFlow':               MoneyflowVolume5d,  
            'Trendline':               Trendline,  
            'Revenue':                 Revenue,  
            'GrossMarginChange':       GrossMarginChange,  
            'Gross_Income_Margin':     Gross_Income_Margin,  
            'MaxGap':                  MaxGap,  
        }  

@ Grant Thanks for taking a look at Gross_Income_Margin and providing an update. I'd love to hear the feedback of the above notebook as well from those who are able to interpret all of the charts.

Here's a version that passes all of the constraints...should be good-to-go for the contest/fund, with a SR ~ 1.0 (risk-free rate not included in computation).

Fix was to add the beta constraint to the optimize API.

Still some work to do, but at least it is now conforming (although I'm not sure what the "strategic intent" statement would be).

Strategic Intent: Buy low, sell high; sell high, buy low.

;)

Kidding aside, (and I'm guessing here) in a multi-factor strategy such as this one, I think one would need to have a rationale for each individual factor, as there may not be an overarching one for the 'combined factor'.

For example, the rationale for the mean_rev factor might be that 'the market over-reacts in the short term, so the factor buys stocks that are over-sold, and sells stocks that are over-bought.' This might be similar to 'value' factors (though 'value' may be over-sold/bought at longer timeframes), but quite different from 'trend' 'momentum' and 'growth' factors (i.e. 'what's trending/moving/growing will continue to trend/move/grow').

I think it would be great to hear from Q if they would be ok with strategies that don't necessarily have a unified 'strategic intent' (other than the one at the top of this post ;)), but instead for each individual factor there's a clear 'strategic intent?' Or would they prefer strategies with individual factors that all contribute to the same overall strategy's economic rationale? This way they may be able to better combine different strategies/factors in their portfolio of strategies/factors?

I'd heard from Q that although they have admitted some "idiosyncratic" strategies into the fund, the strong preference is for broad, diversified approaches, that scale to large numbers of stocks (e.g. ~100-1000). While there may be a handful of golden idiosyncratic market inefficiencies, I'm thinking that the path is more like what is described in the 101 Formulaic Alphas paper where combination of "faint and ephemeral" alphas is discussed. The concept of a "strategic intent" changes as one scales up to a large number of alpha factors; certainly, if one has ~100 of them, describing the strategic intent of each in detail may not shed much light on the essence of the strategy. One has to talk about how the factors are combined, and why the combination technique might work.

Presently, I just use the sum-of-z-scores alpha combination technique, but I'm hoping to set up to do more sophisticated stuff (although not necessarily computationally difficult).

Here's another update, if anybody wants to play around with it.

Hi Grant,

When are you entering the contest again? Will it be with a version of this strategy?

Joakim -

I’m in no rush, although it is tempting to enter the contest just for yucks. I’m still adding factors and probably need to drop some. I’d also like to try some other alpha combination approaches once the set of factors is decent.

One thing I’ll need to fix is any persistent exposure to the Q risk factors since this is a no-no (although I don’t quite understand if it matters if the return for a persistent risk factor is relatively low).

Sounds like a solid approach!

Which Q Risk factors does it have consistent over or under exposure to?

Haven't really looked into it in detail. I've attached so you can have a look for yourself, if you want. I'm not sure if I should be worried about it, or not. Seems like a qualitative nice-to-have from Jess that hasn't been incorporated into go/no-go requirements yet.

Thanks Grant!

I'm not trying to speak for them, but I don't think it's a go/no-go per se, as long as they are within the bounds. I think it's a good tool for us to look at and be aware of, so we don't (inadvertently perhaps) develop factors that are just proxies for already known stuff that can be had cheaper. That said, factors that have very little consistent exposure to any of the risk factors are probably more valuable to them.

For this one, it looks like you're somewhat long momentum and st_reversal (somewhat consistently) and short size and value perhaps? Not really sure what to do about the overly long exposures, maybe adjust or remove factors that contribute most to those exposures? For the short value and size exposure, maybe add a large-cap value factor, e.g. MarketCap() * some_yield_factor (should be predictive on its own first though), to see if that brings up the short size and value exposures a bit (and perhaps neutralizes the long risk exposures as well)? However, I personally try not to use MarketCap() in my factors, hoping that any consistent size exposure will take care of itself if I add other non-correlated and predictive factors, so maybe I'm contradicting myself there a bit. :)

Just some free and unsolicited advice, so please take it at face value... ;)

Hi Joakim -

From a developer standpoint, at some level, one needs go/no-go requirements. We have these written down on https://www.quantopian.com/get-funded#the-constraints, including the qualitative "strategic intent" requirement (which, in my opinion, needs work as a requirement). Recently, we've heard that algos really need to be trading >= 100 stocks (but then I think I heard the guidance more is actually preferred), and that persistent exposure to Q risk factors is bad. There has also been some qualitative guidance on turnover, basically stating that it is better to meet the turnover requirement by trading a little bit each day, versus in bursts. And I heard that a return of ~3% per year is o.k...what about 2%...or maybe return doesn't matter? Etc.

In the end, as a developer, it should be straightforward to perform a set of go/no-go verification tests, to answer the question "Am I done?" and have some reasonable probability of success in the contest and getting funded (I'm thinking a 50/50 shot at "winning" something).

I guess we see things a bit differently. The way I see it they already have provided these “hard” go/no-go requirements with the 10 contest requirements. Within these requirements however, they also have “softer” preferences as well.

Take the beta to spy requirement for example. All else being equal, which strategy would you pick first if you had to choose between one with a beta of +0.2 or one with a beta close to 0 (again, all else being equal)? The strategy with a beta of +0.2 might be ok too eventually, if they can match it with a similar strategy with a beta of -0.2 to “hedge out” the market risk. However, with a 0 beta strategy it’s a lot easier for them to allocate capital sooner as there’s no market risk exposure to hedge out.

Regarding absolute returns, I don’t think they matter too much, as long as they are consistently positive. What does matter though is the risk-adjusted-returns. Also, as it’s still very early days for them, as they are just getting started building up their portfolio, I’d say they have very little appetite for risk at the moment, so “low risk” may be more important to them than “high return.” In other words (again, all else being equal) they might allocate capital sooner to strategies with Returns/Volatility of 4/2 rather than to strategies with 10/5 annualised Return/Volatility, even though the ratio value (2) itself is the same.

In their portfolio of strategies, I believe their aim is to maintain as close to 0 exposure to beta to spy, netdollar, individual sectors, styles, and volatility. Any returns they get is then effectively “risk free” and can be leveraged to the moon. It’s therefore a lot easier for them to pick strategies that individually already have very little exposure to these risk factors. It doesn’t necessarily disqualify strategies that take on specific exposure per se, it just means that it may take a bit longer until they want to allocate capital to them.

Just my 2 bitcoins worth... ;)

Here's an updated Alphalens snapshot with all of the factors I've accumulated so far. I figure it might be worthwhile for someone to noodle on and to play with.

Here's an update. Added a few more factors. Cumulative common returns are quite high. Not sure why.

Must have taken a lot of effort and time to create this algorithm. Thanks for sharing.

Satya -

It's mostly just hacked together. I'd encourage you to have a look at the individual factors and pick which ones would be best, and add your own. I'm keeping a running list of notes and links above, which you might also find useful.

Thanks Grant. Would it be possible to dynamically select the best performing alpha factors and filter them on a rolling basis?

@ Satya -

Yes, at some point, I'd like to figure out how to weight the factors dynamically, for comparison to the present baseline equal weighting. I started to consider this here:

https://www.quantopian.com/posts/alpha-factor-combination-in-pipeline-how-to-fancify-it

I'm still getting up the learning curve on how best to approach the problem.

For starters, I may just concoct a static, unequal weight scheme. For fundamental factors, one practically needs the an entire Q-supported backtest anyway for decent statistics, so I might as well compute the weights by hand and hard-code them into the algo.

I know some people compute IC (information coefficient) online (can be done using pipeline) to weight the factors. Just trying to figure out how to do it.

Would it be possible to combined the Factors used in your algo and mask the below "Make_Filters" coding over them? Basically I would want to use them to filter on the factors which can produce the alpha.

def make_filters():  
    class STA(CustomFactor):  
        inputs = [Fundamentals.operating_cash_flow,  
                  Fundamentals.net_income_continuous_operations,  
                  Fundamentals.total_assets]  
        window_length = 1  
        def compute(self, today, assets, out, ocf, ni, ta):  
            ta = np.where(np.isnan(ta), 0, ta)  
            ocf = np.where(np.isnan(ocf), 0, ocf)  
            ni = np.where(np.isnan(ni), 0, ni)  
            out[:] = preprocess(abs(ni[-1] - ocf[-1])/ ta[-1])  
    class SNOA(CustomFactor):  
        inputs = [Fundamentals.total_assets,  
                 Fundamentals.cash_and_cash_equivalents,  
                 Fundamentals.current_debt, # same as short-term debt?  
                 Fundamentals.minority_interest_balance_sheet,  
                 Fundamentals.long_term_debt, # check same?  
                 Fundamentals.preferred_stock] # check same?  
        window_length = 1  
        def compute(self, today, assets, out, ta, cace, cd, mi, ltd, ps):  
            ta = np.where(np.isnan(ta), 0, ta)  
            cace = np.where(np.isnan(cace), 0, cace)  
            cd = np.where(np.isnan(cd), 0, cd)  
            mi = np.where(np.isnan(mi), 0, mi)  
            ltd = np.where(np.isnan(ltd), 0, ltd)  
            ps = np.where(np.isnan(ps), 0, ps)  
            results = ((ta[-1]-cace[-1])-(ta[-1]-cace[-1]-ltd[-1]-cd[-1]-ps[-1]-mi[-1]))/ta[-1]  
            out[:] = preprocess(np.where(np.isnan(results),0,results))  
    class ROA(CustomFactor):  
        inputs = [Fundamentals.roa]  
        window_length = 1  
        def compute(self, today, assets, out, roa):  
            out[:] = preprocess(np.where(roa[-1]>0,1,0))  
    class FCFTA(CustomFactor):  
        inputs = [Fundamentals.free_cash_flow,  
                 Fundamentals.total_assets]  
        window_length = 1  
        def compute(self, today, assets, out, fcf, ta):  
            out[:] = preprocess(np.where(fcf[-1]/ta[-1]>0,1,0))  
    class ROA_GROWTH(CustomFactor):  
        inputs = [Fundamentals.roa]  
        window_length = 252  
        def compute(self, today, assets, out, roa):  
            out[:] = np.where(roa[-1]>roa[-252],1,0)  
    class FCFTA_ROA(CustomFactor):  
        inputs = [Fundamentals.free_cash_flow,  
                  Fundamentals.total_assets,  
                  Fundamentals.roa]  
        window_length = 1  
        def compute(self, today, assets, out, fcf, ta, roa):  
            out[:] = preprocess(np.where(fcf[-1]/ta[-1]>roa[-1],1,0))  
    class FCFTA_GROWTH(CustomFactor):  
        inputs = [Fundamentals.free_cash_flow,  
                  Fundamentals.total_assets]  
        window_length = 252  
        def compute(self, today, assets, out, fcf, ta):  
            out[:] = preprocess(np.where(fcf[-1]/ta[-1]>fcf[-252]/ta[-252],1,0))  
    class LTD_GROWTH(CustomFactor):  
        inputs = [Fundamentals.total_assets,  
                  Fundamentals.long_term_debt]  
        window_length = 252  
        def compute(self, today, assets, out, ta, ltd):  
            out[:] = preprocess(np.where(ltd[-1]/ta[-1]<ltd[-252]/ta[-252],1,0))  
    class CR_GROWTH(CustomFactor):  
        inputs = [Fundamentals.current_ratio]  
        window_length = 252  
        def compute(self, today, assets, out, cr):  
            out[:] = preprocess(np.where(cr[-1]>cr[-252],1,0))  
    class GM_GROWTH(CustomFactor):  
        inputs = [Fundamentals.gross_margin]  
        window_length = 252  
        def compute(self, today, assets, out, gm):  
            out[:] = preprocess(np.where(gm[-1]>gm[-252],1,0))  
    class ATR_GROWTH(CustomFactor):  
        inputs = [Fundamentals.assets_turnover]  
        window_length = 252  
        def compute(self, today, assets, out, atr):  
            out[:] = preprocess(np.where(atr[-1]>atr[-252],1,0))  
    class NEQISS(CustomFactor):  
        inputs = [Fundamentals.shares_outstanding]  
        window_length = 252  
        def compute(self, today, assets, out, so):  
            out[:] = preprocess(np.where(so[-1]-so[-252]<1,1,0))  
    class GM_GROWTH_2YR(CustomFactor):  
        inputs = [Fundamentals.gross_margin]  
        window_length = 504  
        def compute(self, today, assets, out, gm):  
            out[:] = preprocess(gmean([gm[-1]+1, gm[-252]+1,gm[-504]+1])-1)  
    class GM_STABILITY_2YR(CustomFactor):  
        inputs = [Fundamentals.gross_margin]  
        window_length = 504  
        def compute(self, today, assets, out, gm):  
            out[:] = preprocess(np.std([gm[-1]-gm[-252],gm[-252]-gm[-504]],axis=0))  
    class ROA_GROWTH_2YR(CustomFactor):  
        inputs = [Fundamentals.roa]  
        window_length = 504  
        def compute(self, today, assets, out, roa):  
            out[:] = preprocess(gmean([roa[-1]+1, roa[-252]+1,roa[-504]+1])-1)  
    class ROIC_GROWTH_2YR(CustomFactor):  
        inputs = [Fundamentals.roic]  
        window_length = 504  
        def compute(self, today, assets, out, roic):  
            out[:] = preprocess(gmean([roic[-1]+1, roic[-252]+1,roic[-504]+1])-1)  
    class GM_GROWTH_8YR(CustomFactor):  
        inputs = [Fundamentals.gross_margin]  
        window_length = 8  
        def compute(self, today, assets, out, gm):  
            out[:] = preprocess(gmean([gm[-1]+1, gm[-2]+1, gm[-3]+1, gm[-4]+1, gm[-5]+1, gm[-6]+1, gm[-7]+1, gm[-8]+1])-1)  
    class GM_STABILITY_8YR(CustomFactor):  
        inputs = [Fundamentals.gross_margin]  
        window_length = 9  
        def compute(self, today, assets, out, gm):  
            out[:] = preprocess(gm[-8])  
    class ROA_GROWTH_8YR(CustomFactor):  
        inputs = [Fundamentals.roa]  
        window_length = 9  
        def compute(self, today, assets, out, roa):  
            out[:] = preprocess(gmean([roa[-1]/100+1, roa[-2]/100+1,roa[-3]/100+1,roa[-4]/100+1,roa[-5]/100+1,roa[-6]/100+1,roa[-7]/100+1,roa[-8]/100+1])-1)  
    class ROIC_GROWTH_8YR(CustomFactor):  
        inputs = [Fundamentals.roic]  
        window_length = 9  
        def compute(self, today, assets, out, roic):  
            out[:] = preprocess(gmean([roic[-1]/100+1, roic[-2]/100+1,roic[-3]/100+1,roic[-4]/100+1,roic[-5]/100+1,roic[-6]/100+1,roic[-7]/100+1,roic[-8]/100+1])-1)  
    return {  
        'STA':                     STA,  
        'SNOA':                    SNOA,  
        'ROA':                     ROA,  
        'FCFTA':                   FCFTA,  
        'ROA_GROWTH':              ROA_GROWTH,  
        'FCFTA_ROA':               FCFTA_ROA,  
        'FCFTA_GROWTH':            FCFTA_GROWTH,  
        'LTD_GROWTH':              LTD_GROWTH,  
        'CR_GROWTH':               CR_GROWTH,  
        'GM_GROWTH':               GM_GROWTH,  
        'ATR_GROWTH':              ATR_GROWTH,  
        'NEQISS':                  NEQISS,  
        'GM_GROWTH_2YR':           GM_GROWTH_2YR,  
        'GM_STABILITY_2YR':        GM_STABILITY_2YR,  
        'ROA_GROWTH_2YR':          ROA_GROWTH_2YR,  
        'ROIC_GROWTH_2YR':         ROIC_GROWTH_2YR,  
        'GM_STABILITY_8YR':        GM_STABILITY_8YR,  
        'ROA_GROWTH_8YR':          ROA_GROWTH_8YR,  
        'ROIC_GROWTH_8YR':         ROIC_GROWTH_8YR,  
    }  

Hi Daniel -

I don't understand what you want to do. What do you mean "Basically I would want to use them to filter on the factors which can produce the alpha"?

To filter, you'd need to combine the factors you have above (e.g. sum-of-z-scores, or just sum assuming you're z-scoring in preprocess). Then, you would need to convert to binary. One way to do this would be to take the absolute value of the combined alpha, and then set a threshold. Anything above the threshold is set to True, and anything below, False.

I'm not sure the above prescription makes sense.

I'd consider a couple approaches:

  1. Simply add the new factors to the existing ones (using the same `preprocess' normalization).
  2. Combine the new factors to form alpha_new, and combine the old factors to form alpha_old. Then sum them like this:

c_new*alpha_new + c_old*alpha_old

c_new > 0, c_old > 0, c_new + c_old = 1

You could try various values for c_new (c_old = 1-c_new) to see if there is an optimum.

Well, I tried adding everything together, but I get an error 'ValueError: too many inputs`. Anyone know what the problem is? Is there a limit on the number of custom factors that can be run?

Grant, I received the same message when I tried to add around 30 or so. Your 2nd approach was more what I had in mind. At first I thought the below might work .

    universe = QTradableStocksUS()  
    factors = make_factors()  
    filters = make_filters()  
    combined_filters = None  
    for name, f in filters.iteritems():  
        if combined_filters == None:  
            combined_filters = f(mask=universe)  
        else:  
            combined_filters += f(mask=universe)  
    combined_alpha = None  
    for name, f in factors.iteritems():  
        if combined_alpha == None:  
            combined_alpha = f(mask=combined_filters)  
        else:  
            combined_alpha += f(mask=combined_filters)  

just curious, how to run this algorithm monthly since some alphas are based on daily calculation.

@ Daniel -

There does seem to be a limit on the number of custom factors supported by Pipeline. However, I just realized that one can have multiple Pipelines. I haven't tried it yet, but maybe this is a work-around? I think this would mean doing the final combination outside of Pipeline.

I broke up the 40 factors into two separate pipelines to get the attached algo to run. Presumably the same trick works in a research notebook.

This would seem to be a silly limitation, but maybe there is a reason for it?

Grant, thanks for taking a stab at it and bringing up the idea to isolate pipelines for each set. I'm surprised at the cut off number of factors one can use in a single pipeline. Just two quick questions, does the 21 below represent the 1st set of custom factors and the last 21 factors? Also what does the "rng" mean?

factors = make_factors((0,21))
factors = make_factors((21,None))

As always I appreciate hearing/reading your work.

I'm calling make_factors with the rng tuple so that sub-sets of factors can be returned, rather than the entire set (via return factors[rng[0]:rng[1]]

If I did everything correctly, the code should follow the normal slicing rules for Python lists:

https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python/

So, factors = make_factors((0,21)) should return up through the element with index 20 (indexing starts at zero, so 21 factors are returned). Then, factors = make_factors((21,None)) returns the remaining factors, starting with the element with index 21. The None indicates to go to the end of the list.

There's probably a more elegant way of doing this; I just provided a first-cut.