[Help Request] Understanding input variable in custom factor

Back to Community

posted Mar 23, 2018

Hi all,

I was looking for some help understanding how to create custom factors. I am confused by how the input variable works on the backend. Per the Q's API, it is an MxN matrix with m=securities in the universe and n = window_length. Thus I understand the following example code and performing calculations on axis = 0 to get calculations related to the particular security.

class MedianValue(CustomFactor):  
    """  
    Computes the median value of an arbitrary single input over an  
    arbitrary window..

    Does not declare any defaults, so values for `window_length` and  
    `inputs` must be passed explicitly on every construction.  
    """

    def compute(self, today, assets, out, data):  
        from numpy import nanmedian  
        out[:] = data.nanmedian(data, axis=0)

However, that logic does not apply to the next code example to mu understanding.

class Momentum(CustomFactor):  
    # Default inputs  
    inputs = [USEquityPricing.close]

    # Compute momentum  
    def compute(self, today, assets, out, close):  
        out[:] = close[-1] / close[0]

In this example, input appears to be a list? If it were a MxN matrix, that calculation does not make any sense?

I am asking because I am currently working on the below piece of code and not sure how to complete it:

class momentum(CustomFactor):  
    inputs = [USEquityPricing.close]  
    window_length = 90  
    def compute(self, today, assets, out, inp):  
        x = np.arange(len(inp))  
        slope, intercept, r_value, p_value, std_err = stats.linregress(x, inp)  
        out[:] = r_value

I've tried messing with this a bunch but keep getting an error that "ValueError: all the input array dimensions except for the concatenation axis must match exactly." I'm quite certain the code as written is trying to pass linregress a list and an array for a regression and breaking it.

But I'm not sure the proper way to format. Any guidance regarding explaining the input variable and this last piece of code is appreciated. All I am trying to achieve in the lat bit is get the r_value of the last x trading days.

Thanks,

2 responses

Dan Whitnable

Mar 23, 2018

The inputs to the compute method of a custom factor are indeed numpy arrays. Dimension 0 (ie the rows) are the dates and dimension 1 (ie the columns) are the assets or securities. When one indexes a multi-dimensional numpy array with the following notation, numpy assumes any subsequent dimensions (ie axis 1 ) are indexed with : operand (see https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html)

my_array[-1]

# is equivalent to

my_array[-1, :]

Therefore, the following simply takes the last row of the numpy array and divides by the first row.

class Momentum(CustomFactor):  
    # Default inputs  
    inputs = [USEquityPricing.close]

    # Compute momentum  
    def compute(self, today, assets, out, close):  
        out[:] = close[-1] / close[0]

You stated "In this example, input appears to be a list? If it were a MxN matrix, that calculation does not make any sense?" Actually input isn't a list but an array and, yes, the calculation does make sense. It's just shorthand for

        out[:] = close[-1, :] / close[0, :]

Now, since the input is an array, the 'linregress' method, as you have it written, won't work. It expects exactly 2 arrays (or one 2 dimensional array) which it interprets as the Xs and the Ys for the regression.

I'll leave it to another day, or another post, for how to generate the R values for each asset.

Blue Seahawk

Mar 23, 2018

For me, it took awhile to realize the numpy arrays only make sense because they are in the same order as assets. They are separated and then pieced back together. Transpose (.T) to sometimes swap columns & rows for certain operations. Presumably you're using the debugger already, if not, click line numbers to set breakpoints. I've never actually used Slp() on Momentum() but there should be some clues here to work with that might help too. If this will run then you can adapt for r_value. Would like to find a way to drop this 'for' loop.

from quantopian.pipeline.factors import AverageDollarVolume

    m = QTradableStocksUS() & AverageDollarVolume().top(7)  # mask, few stocks

    mmntm = Slp(Momentum(window_length=8, mask=m))  # small wndw for testing

from scipy.stats import linregress  
class Slp(CustomFactor):  
    def compute(self, today, assets, out, z):  
        slopes = np.empty(len((z.T)), dtype=np.float64)  
        x = np.arange(len(z))  
        i = -1  
        for col in np.log(z).T:  
            i += 1  
            if np.allclose(col, col[0]) or np.all(np.isnan(col)):  
                slopes[i] = 0  
                continue  
            slope, intercept, r_value, p_value, std_err = linregress(x, col)  
            slopes[i] = (np.power(np.exp(slope), self.window_length) - 1) * 100 * r_value**2  
        out[:] = slopes

class Momentum(CustomFactor):  
    inputs = [USEquityPricing.close]  
    def compute(self, today, assets, out, close):  
        out[:] = close[-1] / close[0]

You've successfully submitted a support ticket.

Our support team will be in touch soon.