Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
pipeline custom factor - output different from input?

From a pipeline custom factor, is it possible to output a subset of the input securities? Or is it always required that out[:] contain the same set of securities used for the input?

As a work-around, just to get the attached code to run, I'm using:

x_tilde = np.nan_to_num(np.mean(close,axis=0)/close[-1,:])

But I would prefer to drop any stocks that have missing data over the trailing window. However, when I do this, an error results, since it appears that out[:] is expecting all of the securities used as input to the custom factor.

Is it correct that custom factors must always return a value for all input stocks? If so, for window_length > 1, how is one to deal with missing data (e.g. the stock IPO'd within the window)?

4 responses

Out must have the same dimensions as the input. The best way to exclude securities with missing data (NaNs) is by dropping rows from the output DataFrame, after calling pipeline_output('my_pipeline'), using the dropna() method. Here is the documentation for it: (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html).

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Ernesto -

That's kinda what I figured. For the factor I use in the example I attached above, I'll have to drop the securities that don't have full price history windows before running the factor on the data, and then add them back in at the output as NaNs. Presumably alphalens will then ignore the NaNs?

Also, note that I cannot view the entire notebook, when I click View Notebook. Or maybe the entire notebook did not get attached?

The basic operation within the custom factor is:

  1. Identify securities that have NaNs in their windows.
  2. Compute factor values for remaining securities (without any NaNs in their windows).
  3. Build output vector of all securities, with NaNs and values assigned appropriately.

I think I found some numpy functions that will do the trick here (without writing nasty loops over securities). However, if there are examples out there, or I'm approaching this incorrectly, please let me know.

Here's the factor I'm working on:

class OptRev(CustomFactor):  
    inputs = [USEquityPricing.close]  
    window_length = 5  
    def compute(self, today, assets, out, close):  
        x_tilde = np.nan_to_num(np.mean(close,axis=0)/close[-1,:])  
        m = len(x_tilde)  
        b_t = 1.0*np.ones(m)/m  
        x = cvx.Variable(m)  
        objective = cvx.Minimize(cvx.sum_entries(cvx.square(x-b_t)))  
        constraints = [cvx.sum_entries(x) == 1, cvx.sum_entries(x_tilde*x) >= 1, x >= 0]  
        prob = cvx.Problem(objective, constraints)  
        prob.solve()  
        out[:] = np.squeeze(np.asarray(x.value))  

Here's what I ended up doing:

class OptRev(CustomFactor):  
    inputs = [USEquityPricing.close]  
    window_length = 5  
    def compute(self, today, assets, out, close):  
        x_tilde = np.mean(close,axis=0)/close[-1,:]  
        out_temp = np.copy(x_tilde)  
        n_ok = np.argwhere(np.isfinite(x_tilde))  
        x_tilde = x_tilde[np.isfinite(x_tilde)]  
        m = len(x_tilde)  
        b_t = 1.0*np.ones(m)/m  
        x = cvx.Variable(m)  
        objective = cvx.Minimize(cvx.sum_entries(cvx.square(x-b_t)))  
        constraints = [cvx.sum_entries(x) == 1, cvx.sum_entries(x_tilde*x) >= 1, x >= 0]  
        prob = cvx.Problem(objective, constraints)  
        prob.solve()  
        out[:] = np.put(out_temp,n_ok,np.squeeze(np.asarray(x.value)))