Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
trying new alpha analyzer - hangs?

I'm trying to run the new alpha factor analyzer and it hangs here:

factor_data_specific = al.utils.get_clean_factor_and_forward_returns(  
    results['alpha'],  
    cr_specific,  
    periods=range(1, 22))  

It hangs at 11% memory utilization. Am I doing something wrong? Or is this a bug?

14 responses

What happens if you decrease the period range?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Yes, it runs with a shorter period (see attached), but I'd expect it to run with:

factor_data_specific = al.utils.get_clean_factor_and_forward_returns(  
    results['alpha'],  
    cr_specific,  
    periods=range(1, 22))  

However, it hangs.

Given that factors based on fundamentals will be much longer (e.g. years), it seems like there might be a problem with scaling.

We just shipped a speed enhancement for alphalens, you could retry and see if that helps.

Thanks Thomas -

I will re-try it when I get the chance, but as you know, you or Q support can also just clone the notebook I posted above and test it yourselves. This might be a more direct approach than waiting for me...but perhaps there are other priorities...

Now it runs.

Glad to hear it!

So do you have a design requirement in mind for the number of days that one should be able to analyze? Potentially, there are factors that are on widely different time scales, anywhere from days to years. From your initial example, it appears you are focused on the time scale of days to tens of days. But what about fundamental factors, that potentially could have time scales of years/decades...or maybe a different analysis would apply?

There is no design limitation in running longer windows, you might not want to do it for every single day but maybe 1 month, 2 months etc.

I have been using Thomas's new alpha analyser and even in US prime time it is now only taking about 5 minutes to complete. Its clearly a great deal better than it used to be.

Hi Thomas -

There is no design limitation in running longer windows, you might not want to do it for every single day but maybe 1 month, 2 months etc.

Is there guidance on how to do this? For example, this seems to be the place to control the max time scale:

# Use alphalens to combine factor values with forward returns  
factor_data_total = al.utils.get_clean_factor_and_forward_returns(  
    results['alpha'],  
    pricing,  
    periods=range(1, 22))  

Is it just a matter of replacing range(1,22) with something else? Presumably, periods is always in days (there's no switch to change the units), and one would need to use range(start,stop,step) to do monthly/quarterly plots, with start,stop,step in days. Correct?

By the way, I think there is still an open question of how to deal with factors that trade on different time scales. There was the beginning of a discussion here:

https://www.quantopian.com/posts/daily-and-weekly-rebalancing-of-separate-alpha-factors-using-optimize-api

So using your new tool, say one comes up with 20 factors on 20 different trading time scales--what next? Since the contest/fund is geared toward full-up algos and not factors, then users need to know how best to combine all 20 factors into one algo, right?

By the way, since I'm restricted to post only to my own threads (so as not to corrupt the youth), feel free to share anything relevant I say here on https://www.quantopian.com/posts/an-updated-method-to-analyze-alpha-factors.

Is it just a matter of replacing range(1,22) with something else?

Exactly, and it doesn't need to be a range, just put in sensible values as a list.

Regarding trading factors on different time-scales: This is indeed difficult and something I'm currently working on, so not a solved problem. Use your own ingenuity to come up with something!

Hi Thomas -

Regarding trading factors on different time-scales: This is indeed difficult and something I'm currently working on, so not a solved problem. Use your own ingenuity to come up with something!

I'm curious why you are working in isolation (at least with respect to the broad Q user base)? And not contributing to the discussion initiated by Joakim? Or are you just trying to get to the point where you can put something out as a potential direction in solving the problem within the context of Quantopian? I say this constructively, since you have access to a global pool of clever users, and so if you are not tapping into that resource, it may not be optimal. This is why I tend to publish pretty much everything I do here on the forum.

I'd posted a suggestion that the the problem is analogous to trading a set of ETFs, right? You have the folks running the ETFs doing whatever they do on whatever time scales they see fit, and another layer of asset allocation determining how much weight to apply to each ETF at any given point in time.

My read is that you may need to kinda start with a blank sheet of paper and do some architecting so that the Q workflow/API supports trading portfolios (i.e. factors), in analogy with trading ETFs.

One thing to consider is that my strong sense is that the alpha combination step should be done in before_trading_start (assuming that you don't see a compelling need to support intra-day alpha combination, which would call for a compute window during the trading day). I've posted a hacked way of getting the data into 'before_trading_start` on https://www.quantopian.com/posts/alpha-combination-via-clustering; this could definitely be improved with some changes to the API (e.g. the ability to output data from Pipeline into a Pandas MultiIndex directly).

The other approach, of course, would be to sort out a remuneration scheme for factors instead of full-up algos. Then all of this load would be off of users, and they could just focus on the alpha discovery part. This would have the added advantage that you could apply more scalable computing power to the alpha combination step, which you are unable to provide to the masses. Alternatively, you could consider ways for users to submit sets of algos, so that each algo could trade on its characteristic time scale, and then you would deal with the alpha combination problem, including how to optimize the trading with respect to disparate time scales (maybe this is what you are working on?). This might be the easiest in terms of changes to the API, but users would need an API to be able to submit the set of algos, and get back a result, the contests would need to support this approach, etc.--a paradigm shift.

You asked for my ingenuity...

Hi Thomas -

Regarding your comment "This is indeed difficult and something I'm currently working on, so not a solved problem" my intuition is that this is probably not the case in a global sense. The reason I say this is that the hedge fund industry is fairly mature, and historically has been very profitable (and thus would have the resources to hire lots of clever folks like yourself). Bridgewater, for example, has been in business for decades, and trades something like $125 billion with 1700 employees. I would be very surprised if they don't have some at least heuristic solution to this problem, if not a theoretically rigorous one. It may be, though, that there is no published solution. Are you basing your statement after having done a literature search? Or are you just speaking narrowly in the context of Quantopian--that the problem is unsolved for the Q API?

Thomas -

One thought would be to cluster factors with similar mean IC vs. days characteristic curves. This would simplify the problem in some respects, since instead of dealing with 30 factors, one might be dealing with 5 or so clusters, each with a distinct trading time scale.

The other thing to consider is that the mean IC vs. days plots change versus time. It would be interesting to see a heatmap (e.g. time in the vertical axis, days in the horizontal axis, and color for mean IC value).

It would be cool to revive the Alphas 101 project, since you'd have a large set of public alphas to play with to determine time scales, alpha combination techniques, etc.

By the way, are you constrained to use the same platform as we lowly users? Or do you have access to all of the data, and whatever platform you need to do the work (e.g. lots of memory, parallel processing, GPU, sklearn current version, tensorflow, etc.)? It'd be a shame for you to be software/hardware/platform constrained in your research (although I appreciate that for the broad user base, limits need to be imposed).