Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
help w/ Pipeline custom factor when calling global function

I copied Max's code from:

https://www.quantopian.com/posts/capital-expenditure-volatility-capex-vol-with-factset-data-template-fundamental-algo

Then I added a trivial preprocess function that prints and returns a value only.

The print statement suggests that the custom factor is outputting a scalar, not a vector as one might expect.

What's going on?

def preprocess(a):  
    print a  
    return a

class TEM(CustomFactor):  
    """  
    TEM = standard deviation of past 6 quarters' reports  
    """  
    window_length = 390  
    def compute(self, today, assets, out, asof_date, capex, total_assets):  
        values = capex/total_assets  
        for column_ix in range(asof_date.shape[1]):  
            _, unique_indices = np.unique(asof_date[:, column_ix], return_index=True)  
            quarterly_values = values[unique_indices, column_ix]  
            if len(quarterly_values) < 6:  
                quarterly_values = np.hstack([  
                    np.repeat([np.nan], 6 - len(quarterly_values)),  
                    quarterly_values,  
                ])  
            out[column_ix] = preprocess(np.std(quarterly_values[-6:]))  
3 responses

@Grant. Notice that the statement

            out[column_ix] = preprocess(np.std(quarterly_values[-6:]))  

is inside the for loop. It's looping over the column data (ie each asset) and assigning a value in 'out' for each asset. So, yes, your print statement is outputting a scaler value for each asset but the for loop is building the custom factor output which is a 1D numpy array.

See attached algo. I filtered the custom factor to just three stocks to make the logs easier to read. I also added a final 'print out' statement to print the entire factor output. Look at the logs. There are three separate scaler values printed by your 'preprocess' function and then a numpy array with three values printed by the final print statement.

Maybe that wasn't your question?

Thanks Dan -

I now understand what to do. Something like this:

def preprocess(a):  
    print a  
    return a

class TEM(CustomFactor):  
    """  
    TEM = standard deviation of past 6 quarters' reports  
    """  
    window_length = 390  
    def compute(self, today, assets, out, asof_date, capex, total_assets):  
        values = capex/total_assets  
        out_temp = np.zeros_like(values[-1,:])  
        for column_ix in range(asof_date.shape[1]):  
            _, unique_indices = np.unique(asof_date[:, column_ix], return_index=True)  
            quarterly_values = values[unique_indices, column_ix]  
            if len(quarterly_values) < 6:  
                quarterly_values = np.hstack([  
                    np.repeat([np.nan], 6 - len(quarterly_values)),  
                    quarterly_values,  
                ])  
            out_temp[column_ix] = np.std(quarterly_values[-6:])  
        out_temp = preprocess(out_temp)  
        out[:] = out_temp  

Hi Dan -

By the way, the code for extracting the quarterly values is kinda gnarly. Might there be some Pandas function(s) that encapsulates it?