The key to understanding the MaxLossExceededError
error is to look at the error description. In the above example it was
Dropped 100.0% entries from factor data: 0.0% in forward returns computation and 100.0% in binning phase
There are two distinct issues that cause this error. First, not enough forward returns to analyze the factor over the entire time period. And second, not being able to place the factor results into the desired bins. The error indicates that 100% were dropped in the binning phase. There's our culprit.
What does this mean? The get_clean_factor_and_forward_returns
method tries to assign a 'quantile' or 'bin' to each factor value. The default is to place values into 5 quantiles so there are an equal number of values in each quantile. The method will try to select bins (and associated bin edges) so the quantity in each bin is the same. It does this each day so the bins one day may be different then the bins on the next. So, if the factor is a boolean (ie 0 or 1) then this method will try to place the 0 and 1 values into 5 bins each having an equal number entries. Clearly this won't work. One can't place two discrete values into five discrete bins. One solution could be to set the quantiles
parameter to 2. Try to place the values into two discrete bins.
merged_data = get_clean_factor_and_forward_returns(factor, pricing, quantiles=2)
This may work (but typically not). The issue is the get_clean_factor_and_forward_returns
method tries to put an equal number of values into each quantile. Unless, there just happens to be a 50-50 split in the factor data, this will also cause an error.
So, what to do? Instead of creating quantiles with an equal number of values (and varying the bin sizes), one can create quantiles with equal bin sizes (and varying the number in each bin). This is done by setting the bins
parameter. One requirement is to also set the quantiles
parameter to 'None'
merged_data = get_clean_factor_and_forward_returns(factor, pricing, quantiles=None, bins=2)
This will try to make two equal size bins by taking the min and max value each day. Unfortunately, for technical reasons, this doesn't work well with two discrete boolean values. So, the third approach is to explicitly define the bins. One needs three 'edges' to describe two bins. So, something like this would work.
merged_data = get_clean_factor_and_forward_returns(factor, pricing, quantiles=None, bins=[-1,.5,2])
That will instruct the get_clean_factor_and_forward_returns
method to put all the False values (ie 0) into bin 1 and all the True values (ie 1) into bin 2. One can choose different edges. Just make sure that the 0s are in one while the 1s are in the other.
One can now use the results as an input to Alphalens. The factor values will be placed into two bins (or quantiles) for analysis. One word of caution. The default 'quantiles=5' will produce roughly an equal number of values in each quantile. This will not be the case when setting 'bins'. Why is this a concern? One may see a large alpha associated with a particular bin and feel they found a great alpha signal. However, take note of the number, and percent, of values in each bin. This can be found in the first few tables produced by Aphalens. Often a high alpha is associated with a small number of values. If only a small number of values are generating alpha, this may not be a robust factor in real life.
Hope that helps. Attached is a notebook.