Cutting down on computation time: How to get multiple variables from a single custom factor

Back to Community

edited Jul 21, 2020

I have a created custom factor to detect the presence of a long term trend. In the pipeline the custom factor produces a dataframe with historic data per asset, and passes it to a function that then aims to find a fitting trend and then returns the startdate for that trend. By doing so, I avoid having to do tons of databasecalls.

However, I now get only one variable per asset as output, where potentially I could easily have many more, all as output from the same calculation.

class weeklyConfirmations(CustomFactor):  
    inputs = [USEquityPricing.low, USEquityPricing.high, USEquityPricing.volume]  
    window_safe = True  
#    resamples daily data to weekly  
    def compute(self, today, assets, out, low, high, volume):  
        n=5  
        low_df = pd.DataFrame(low, columns=assets).stack()  
        high_df = pd.DataFrame(high, columns=assets).stack()  
        volume_df = pd.DataFrame(volume, columns=assets).stack()  
        df = pd.concat([low_df, high_df, volume_df], axis=1)  
        df.columns=['low', 'high', 'volume']  
        df.index = df.index.rename(['day', 'sid'])

        #'Since this historic data does not have a timeseries index, we cannot use the resample  
        # function to produce weekly data. We use the following instead:  
        n=5  
        df2 = df.reset_index()  
        df2.sort_values(by=['sid', 'day'], inplace=True)  
        df3 = df2.groupby([np.arange(len(df2))//n, 'sid']).agg({'day': max, 'low': min, 'high': max, 'volume': sum})  
        df4 = df3.reset_index()  
        df4.sort_values(by=['day', 'sid'], inplace=True)  
        df4.set_index(['day', 'sid'], inplace = True)  
        df4.drop('level_0', axis =1, inplace = True)  
#        print('with the new index it looks like \n', df4.tail())

        def my_df_function(sid_df):  
            my_result = longTermConfirmed(sid_df)  
            return my_result  

        # Rather than looping over each security it's much faster to group by security and apply a function  
        buy_signal = df4.groupby(level=1).apply(my_df_function)  
        out[:] = buy_signal

The actual logic resides in the CustomFactor that is referenced as longTermConfirmed(sid_df) in the code above:

def longTermConfirmed(stock_hist):  
    trendStartDate = np.nan  
    my_df = pd.DataFrame(columns=['Start date', 'Confirmations','Squared distance', 'Porosity', 'Porosity at latest conf date'])  
    length = len(stock_hist)  
    n=2  
    launchpoints = list(argrelextrema(stock_hist['low'].values, np.less, axis =0, order = n))  
    newlist = launchpoints[0].tolist()

    newlist2 = [i for i in newlist if i < length - 8]  
    # We consider only trends that are at least 2 months old, hence subtracting 8 weeks  
    for i in newlist2:  
        dft = stock_hist.copy().iloc[i:]  
        trend = calcTrend(dft)  
        result = confirmTrend(trend, 'S', .002, .005, 1)

        if result.empty == False:  
            datelist = result.index  
            porosityAtLatestConfDate = result.iloc[-1]['porosity']  
            result['dist_squared'] = result['distance']**2  
            confCount = len(result.index)  
            latestConfirmationDate = datelist[-1][0]  
            latestRelOBV = result.iloc[-1]['OBV_rel']  
            squared = result['dist_squared'].sum()  
            new_data = {'Start date': i,  
                        'Confirmations': confCount,  
                         'Squared distance': squared,  
#                        'Porosity': porosity,  
                        'Porosity at latest conf date': porosityAtLatestConfDate,  
                        'Latest Confirmation Date': latestConfirmationDate,  
                       'Rel OBV': latestRelOBV}  
            my_df = my_df.append(new_data, ignore_index = True)

    if my_df.empty == False:  
        my_df = my_df.sort_values(["Confirmations", "Squared distance"], ascending = (False, True))  
        mask = (my_df['Confirmations'] >= 2) #require at least two confirmations to be considered  
        df2= my_df[mask]  
        emptydf2 = df2.empty  
        if emptydf2 == False:  
            df2.sort_values(['Latest Confirmation Date','Rel OBV'], ascending = [0,0])  
            if ((df2.iloc[0]['Latest Confirmation Date'] >= length * 5 -1 )) :

                 trendStartDate = length * 5  - df2.iloc[0]['Start date'] *5


    return trendStartDate

This code is now set up to return the date trendStartDate at which a particular uptrend started.

That same longTermConfirmed() function could output additional valuable information (e.g. number of times the trend has been confirmed, etc. etc.) that I intent to use in my algo. As there is some computationally heavy work going on, I obviously want to avoid having to run close variants of that same function, if i could do it all in one pass.
However, I am going completely blank as to how to pass back multiple variables per assets as output from the function longTermConfirmed() - to the customFactor??

Does anyone know how I could solve this?

4 responses

Dan Whitnable

Jul 22, 2020

Good callout to note that running a function multiple times to get multiple outputs often isn't the most efficient approach. Typically much faster to execute a function just once. But how does this work? There are maybe two questions here. First, how to return multiple values from the pandas apply method, and second, how to return multiple outputs from a factor.

The apply method can be used to iterate through the assets and pass a column of input data to a user defined function. That function should typically return a single value. However, python (and pandas) is pretty flexible in what a 'single value' is. While normally this would be a single scaler value, it could also be a tuple or a list of scaler values. In the later case the value is a list so a 'single value' is a single list.

So, for the user defined function, simply return multiple values. Something like this

        def my_function(column):  
            """  
            Return three separate values for each asset column  
            Here we return the the mean, min, and max price for each asset  
            """  
            mean_price = column.mean()  
            min_price = column.min()  
            max_price = column.max()  
            return mean_price, min_price, max_price

That's pretty straight forward. However, what to do with the results, and how to reference them, requires a bit of Python magic. There are two Python operators called * unpack and zip which can be used to first 'unpack' the results and then 'zip' them back into separate lists. I must admit I don't use these a lot and find myself googling 'zip unpack python' to refresh my memory how these work. I won't go into those details here but here is how to get the three lists of values when applying the method above.

        mean_prices, min_prices, max_prices = zip(*close_prices_df.apply(my_function))

As I said, Python magic. The result is three lists with the mean, min, and max values for each asset. The apply function could return any values and any number of values. I just used these three as an example.

The final step is to assign these values to separate factor outputs. First, the outputs must be named, and then simply assign values to each just as one would do for a basic custom factor with a single output. Something like this

    # define the factor's outputs  
    outputs = ['mean_value', 'min_value', 'max_value']

    # then set the values  
        out.mean_value[:] = mean_prices  
        out.min_value[:] =  min_prices  
        out.max_value[:] =  max_prices

Hope that's a start on adding multiple outputs. See the attached notebook for an example. Good luck.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Tristan Arkesteijn

Jul 23, 2020

Thank you Dan, for taking the time to write such a detailed answer!

Tristan Arkesteijn

Aug 10, 2020

Quick update for the benefit of others: Dan's solution as above works perfectly!

Dan Whitnable

Aug 10, 2020

@Tristan. Glad it worked for you and always glad to help (if I can).

Disclaimer