Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
How can I use alphalens with boolean factor, True and False

Hi, I have create a simple stocks scanning and im trying to use alphalens to see the forward perfomance of the selected stocks, but is shows up with the next error:

Dropped 100.0% entries from factor data: 0.0% in forward returns computation and 100.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).

MaxLossExceededError: max_loss (35.0%) exceeded 100.0%, consider increasing it.

I suspect with the information I did research that the problem is because the factor is a boolean factor, I need help figure out how to solve this.

Thank you a lot!

3 responses

The key to understanding the MaxLossExceededError error is to look at the error description. In the above example it was

Dropped 100.0% entries from factor data: 0.0% in forward returns computation and 100.0% in binning phase

There are two distinct issues that cause this error. First, not enough forward returns to analyze the factor over the entire time period. And second, not being able to place the factor results into the desired bins. The error indicates that 100% were dropped in the binning phase. There's our culprit.

What does this mean? The get_clean_factor_and_forward_returns method tries to assign a 'quantile' or 'bin' to each factor value. The default is to place values into 5 quantiles so there are an equal number of values in each quantile. The method will try to select bins (and associated bin edges) so the quantity in each bin is the same. It does this each day so the bins one day may be different then the bins on the next. So, if the factor is a boolean (ie 0 or 1) then this method will try to place the 0 and 1 values into 5 bins each having an equal number entries. Clearly this won't work. One can't place two discrete values into five discrete bins. One solution could be to set the quantiles parameter to 2. Try to place the values into two discrete bins.

merged_data = get_clean_factor_and_forward_returns(factor, pricing, quantiles=2)

This may work (but typically not). The issue is the get_clean_factor_and_forward_returns method tries to put an equal number of values into each quantile. Unless, there just happens to be a 50-50 split in the factor data, this will also cause an error.

So, what to do? Instead of creating quantiles with an equal number of values (and varying the bin sizes), one can create quantiles with equal bin sizes (and varying the number in each bin). This is done by setting the bins parameter. One requirement is to also set the quantiles parameter to 'None'

merged_data = get_clean_factor_and_forward_returns(factor, pricing, quantiles=None, bins=2)

This will try to make two equal size bins by taking the min and max value each day. Unfortunately, for technical reasons, this doesn't work well with two discrete boolean values. So, the third approach is to explicitly define the bins. One needs three 'edges' to describe two bins. So, something like this would work.

merged_data = get_clean_factor_and_forward_returns(factor, pricing, quantiles=None, bins=[-1,.5,2])

That will instruct the get_clean_factor_and_forward_returns method to put all the False values (ie 0) into bin 1 and all the True values (ie 1) into bin 2. One can choose different edges. Just make sure that the 0s are in one while the 1s are in the other.

One can now use the results as an input to Alphalens. The factor values will be placed into two bins (or quantiles) for analysis. One word of caution. The default 'quantiles=5' will produce roughly an equal number of values in each quantile. This will not be the case when setting 'bins'. Why is this a concern? One may see a large alpha associated with a particular bin and feel they found a great alpha signal. However, take note of the number, and percent, of values in each bin. This can be found in the first few tables produced by Aphalens. Often a high alpha is associated with a small number of values. If only a small number of values are generating alpha, this may not be a robust factor in real life.

Hope that helps. Attached is a notebook.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Dan, thank you a lot for your answer, it helps me understand a lot and to take better ideas to perfom a good strategy!

This is a perfect answer...thank you.