Everyone get this error at some point in their coding career...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all();
Basically, the error surfaces when one tries to do something like this
# assume 'my_array' is an array of some sort - either numpy or pandas
if my_array > some_value:
do something...
# or in this specific case
if close[lookback_index-1][:] < open[lookback_index-1][:]:
The variables 'my_array, 'close', and 'open' refer to arrays. The issue is an array isn't just one value, it's a series of values. Hence the error " The truth value of an array with more than one elements is ambiguous.". The error didn't surface with a single stock because numpy is smart enough to know how to compare arrays if they only have one element. However, with more than one stock, numpy get's lost.
How to debug? Python allows one to easily create and work with single values (or scalers) and complex data structures in similar ways. Moreover, the same statement can refer to different types of objects and not always return a consistent type. While this is powerful it can also be confusing. What I would recommend is be VERY aware of data types everywhere in your code. Some people go as far as putting the type in the name like this
#Add 'df' to dataframes, 'list' to lists, ''dic' to dictionaries, etc
my_variable_df = pd.DataFrame()
my_variable_list = [1,2,3]
my_variable_dic = {}
my_variable = 99
While I'm not a fan of this naming, it does help to be aware of what type of data structure one is working with. Another debugging tool is to use a print statement in a notebook, and check the type of your variables. Like this
# In a notebook put a line like this
print(type(my_variable))
So, that was the error. Now, how to make a custom factor that returns the number of days a stock has followed a bear trend. In this case 'bear trend' means it closed lower than it opened. When possible use the numpy or pandas methods. Stay away from loops of any sort when working with numpy arrays (which are what the data in compute functions are). Also stay away from 'if statements'. That is what caused our error in the first place. Something like this will work for a custom factor.
# This is our factor using all numpy methods
class Bear_Days(CustomFactor):
inputs = [USEquityPricing.close, USEquityPricing.open]
window_length = 20
def compute(self, today, assets, out, close_price, open_price):
"""
'bear_day' will be a ndarray with the same shape as close_price and open_price.
It will have a value of True whenever the close_price < open_price.
We want to find the qty of consecutive bear days from today.
One way to do this is to find the first day where bear_day is False.
Since bear_days will have 0 for False and 1 for True, we want to find the first 0.
This will also be the minimum of the array.
The method 'argmin' to the rescue!
Argmin finds the first occurrence of the min value.
One issue however, is the arrays are in ascending order. This means that the first date is the
earliest date. We really don't want the first occurrence of the min, we want the last
occurrence. Hmmm, there isn't a method for that. Ah, what if we flip the order of
the array? Then the first will be the last. Let's do that.
"""
# Create an array with True whenever a day is a bear day
bear_day = close_price < open_price
# Flip this array to now have the latest current day as row 0 (it's normally -1)
bear_days_flipped = np.flipud(bear_day)
# Now find the first occurrence of a non bear day along each stock
# (ie column value of False or 0) which will also be the minimum
bear_days_count = np.argmin(bear_days_flipped, axis=0)
# There is a special case where all days in the window are bear days.
# The argmin method will return 0.
# We want it to return the max days or the window_length.
# Use the numpy where method to replace just those values.
all_bear_days = np.all(bear_days_flipped, axis=0)
bear_days_count = np.where(all_bear_days, self.window_length, bear_days_count)
out[:] = bear_days_count
This used just 5 lines of code to get our answer. Numpy methods can greatly reduce one's coding effort. The attached notebook explains what is going on here in more detail. There are cells at the bottom which also go through a step by step example of how this works.
Good luck.