How to get a List of Assets into a CustomFactor to be used as Targets?

Back to Community

posted

So I'm trying to get a list of Assets into a CustomFactor, in order to use as targets for things such as linear regression.

Basically I want to use a Filter (say Returns(window_length=10).top(10)) as a source for an Asset list that I can in turn use as targets in multiple linear regressions.

I then plan to average the results of those linear regression together.

But the problem is that I can't seem to find a way to load a list of Assets into the CustomFactor, as a Filter loads only True or False values, not the assets list, and even the "assets" variable in CustomFactor isn't the actual target-able assets but rather a Int64Index.

I even tried running a pipeline to feed into another pipeline (to get the asset list to feed into a new CustomFactor) but apparently you can't run pipelines to feed into pipelines (which I personally think would be a great feature).

Any help I can get would be greatly appreciated.

Let me know if I can explain my problem any better.

5 responses

Joakim Arvidsson (Cream Mongoose)

Can’t you use your filter as a mask in your CustomFactor? E.g.

my_filter = Returns(window_length=10).top(10)
factor = MyCustomFactor(mask = my_filter)

Paul Harstad

Sorry if this was unclear, I'm trying to get a separate list of assets in as comparison to the mask filter.

so for instance I'm trying to compare,

my_base_uni=QuantopianUniverse()

to the assets in my_filter

But even if I have MyCustomFactor(my_filter, mask=my_base_uni) neither the assets variable or input variable load the an actual asset list usable as targets for this (from the API)

"linear_regression(target, regression_length, mask=sentinel('NotSpecified')) Construct a new Factor that performs an ordinary least-squares regression predicting the columns of self from target.

This method can only be called on factors which are deemed safe for use as inputs to other factors. This includes Returns and any factors created from Factor.rank or Factor.zscore.

Parameters:
target (zipline.pipeline.Term with a numeric dtype) – The term to use as the predictor/independent variable in each regression. This may be a Factor, a BoundColumn or a Slice. If target is two-dimensional, regressions are computed asset-wise.
regression_length (int) – Length of the lookback window over which to compute each regression.
mask (zipline.pipeline.Filter, optional) – A Filter describing which assets should be regressed with the target slice each day.
Returns:
regressions (zipline.pipeline.factors.RollingLinearRegression) – A new Factor that will compute linear regressions of target against the columns of self."

So no using the filter as a mask would actually be against what I want. I'm trying to compare a masked universe to the filter.

Dan Whitnable

@Paul Thank you for the clarification. I was going to post the same suggestion as @Joakim before he beat me to it. Knowing that you want an asset list passed to a custom factor (which may or may not be the same as the mask) helped.

You were correct in simply passing a filter to your custom factor. The filter will be passed to the compute function as an ndarray exactly like all other factor inputs. The values will be either True or False depending if the asset is included, or not included, in the filter. It seems the question then is how to use this data?

As an example, assume the following filter we want passed to a custom factor called 'CompareAssets'.


    # Filter for specific assets to pass to the custom  factor  
    # Could be the result of some calculation like '.top(10) but just using a static filter for testing  
    asset_list = StaticAssets(symbols(['IBM', 'C']))

    # Some data to pass to our custom factor  
    factor_input = DailyReturns()

    # Optional mask for universe to pass to the custom factor.  
    # Could be something big like QTradableStocksUS but made it small here for testing  
    factor_universe = StaticAssets(symbols(['IBM', 'AAPL', 'C', 'ARNC']))

    # Create our custom factor. Could add more inputs if desired  
    # Don't necessarily need a mask but can include if desired  
    compare = CompareAssets(inputs=[asset_list, factor_input], mask=factor_universe)

Notice we instantiate our custom factor normally and simply pass a filter as one of the inputs. Could pass more than one filter if desired. The filter shows up as a 2D numpy array exactly like any other input with rows for each day and columns for each asset. However, we don't really want a 2D array of True/False values (the values will be all the same for every day anyway). We want a simple 1D array. There are several ways to accomplish this but maybe use the numpy 'all' method. Here is an example.

class CompareAssets(CustomFactor):  
    """  
    Custom factor to demonstrate using a filter passed as an input to a factor.  
    The results of the filter can be used to select and slice other inputs and used in calc.  
    """  
    # Window length can be anything as needed  
    window_length = 1

    def compute(self, today, assets, out, my_filter, input_a):  
        # Let's get a series of True/False values indicating which assets are in my_filter.  
        # Remember that my_filter is a 2D array with assets for columns and days for rows.  
        # We probably want to know assets having True values for all days so use the 'all' method.  
        my_assets = my_filter.all(axis=0)

So far pretty straightforward. Now, how to use this info inside the custom factor? Numpy boolean indexing to the rescue! One can easily slice and select from numpy arrays using a True/False (ie boolean) list. In general this is called 'boolean-array-indexing' (see https://numpy.org/devdocs/reference/arrays.indexing.html#boolean-array-indexing) Below are just a few of the ways to select certain assets from the inputs to use in calculations inside a custom factor. Continuing the code from above...


         # Select SIDs for just assets in my_filter  
         my_assets_sids = assets[my_assets]

         # Get input values just for these assets  
         inputs_for_my_assets = input_a[:, my_assets]

         # Get input values just for assets NOT in my_assets  
         inputs_not_in_my_assets = input_a[:, ~my_assets]

         # Sum all the input values for all assets  
         sum_input_all = input_a.sum()

         # Sum input values for just my_assets  
         sum_input_my_assets = inputs_for_my_assets.sum()

         # Sum input values for all assets NOT in my_assets  
         sum_input_not_my_assets = inputs_not_in_my_assets.sum()

One typically doesn't need to access the actual SIDs but it's sometimes helpful for debugging. Most important are the input values. Knowing how to select specific values using boolean indexing is key. One can then craft calculations based upon these values.

Attached is a demo notebook. Specifically look at the output of the pipeline. There are print statements inside the custom factor which print out when the pipeline is run. These print the results of the various slicing operations to visually show what's being selected.

Hope this helps?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Paul Harstad

Hey Dan thanks alot, this did help me, but I still can't seem to use the ints (sids) as a target.

For instance how do I do this?

aapl_regressions = RollingLinearRegressionOfReturns(
target=sid(24), returns_length=10, regression_length=30,
)

With a variable as the target? For some odd reason, the sid function doesn't allow variables and there seems to be no way I can find to go from sid to asset.

Thanks for your assistance, I will probably post a new topic for this specific part of the question as well.

Paul Harstad

I continued the problem here, https://www.quantopian.com/posts/how-to-use-a-variable-to-retrieve-an-asset-for-use-as-a-target, if anyone would rather answer there instead.

You've successfully submitted a support ticket.

Our support team will be in touch soon.