Self serve data without access to symbols

Back to Community

posted

When attempting to upload a custom dataset using the self-serve API, where should the asset name data be retrieved from? If attempting to use a dataset in a factor which uses QTradableStocksUS, how should we construct a dataset locally that aligns with QTradableStocksUS assets?

I'm currently trying to use the 3-Month LIBOR rate in a factor, but since Quandl no longer maintains this data, I'm trying to upload via the self-serve API. I'm struggling to understand how/where we should access the asset name data and ensure aligns with QTradableStocksUS.

Thanks in advance.

Joseph.

11 responses

Costantino

It's a general problem with macrodata.

Joseph Moorhouse

Thanks for your reply!

Are you aware of where/how we are meant to source asset name data? Or more specifically, a map of imported asset names to Quantopain asset names (with the correct equity identification code)?

In this case I guess a hack would be to get all asset names from NYSE, NASDAQ, AMEX, since the asset name alignment isn't a concern.

Dan Whitnable

There isn't a need to map 'macro data' to all assets before importing. The data can be mapped or 'broadcasted' to all assets once the data has been imported. Therefore no need to have a list of all asset symbols. Really only need to map the data to a single arbitrary asset when importing one's data.

As a bit of (perhaps redundant and boring) background... pipeline requires data to be associated with assets. There isn't a 'place' to put arbitrary pieces of data in the current structure. However, there is a lot of data which isn't associated with assets but still desirable to use in calculations within pipeline. This might be the current 10 yr treasury yield, unemployment rate, or rainfall in Iowa. This is termed 'macro data'.

How to upload this type of macro data and then how to use it in factors?

The first step of uploading the data using self-serve is exactly like any other self-serve dataset. The one 'trick' is to associate the data with a single arbitrary symbol. Make sure the symbol traded during all periods for which data is being uploaded. A good choice is to use SPY. This has traded under the same ticker continuously for the entire time which Quantopian has history (ie 1-1-2002). This is explained a bit in the docs https://www.quantopian.com/docs/user-guide/tools/self-serve#uploading-macroeconomic-data

So, once macro data has been imported (associated to the single placeholder symbol), how to use it?

The key concept here is a factor 'slice'. Factors can be thought of as representing a dataframe with dates as rows and columns for each security. One can extract, or slice, the values for just a single security using bracket notation like this

my_slice = my_factor[symbols('SPY')]

Once we have a slice it can be manipulated in much the same way as entire factors. To use macro data imported and associated with a single asset, one needs to simply slice the factor by that asset. Once that's done, that slice can be used in calculations and passed as inputs to other factors much the same as factors can. There's more info on slices in the docs https://www.quantopian.com/docs/user-guide/tools/pipeline#slicing-factors

A simple custom factor to broadcast macro data to all assets could be something like this

# Custom factor to broadcast 'macro' self-serve data to all assets  
# The input should be a slice  
class Broadcast_Slice(CustomFactor):  
    window_length = 1

    def compute(self, today, assets, out,  slice):  
        out[:] = slice

However, there often isn't a need to broadcast the macro data to all assets. Slices can be used in regular math expressions just like factors. The value is broadcast automatically. Here's an example which would add our macro data to each asset's close price.

    price = USEquityPricing.close.latest  
    my_slice = my_factor[symbols('SPY')]  
    price_plus_my_data = price + my_slice

Slices can also be combined with factors as inputs to custom factors. This is probably the most typical use case since the macro data logic may be quite involved. Here we do the same as above but with a custom factor

class Slice_Factor(CustomFactor):  
    window_length = 1  
    def compute(self, today, assets, out, slice_value, close):  
        out[:] = close + slice_value 


my_slice = my_factor[symbols('SPY')]  
price_plus_my_data = Slice_Factor(inputs=[my_slice, USEquityPricing.close])

Hope this helps. The attached notebook shows this all in action.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Joseph Moorhouse

Wow, thanks so much Dan!! That makes sense, thank you for going into so much detail!! I will give this a go tomorrow.

Thanks again,

Joseph

Joseph Moorhouse

Hi Dan,

Thanks again for your help, I managed to the get the code working. I'm trying now to use your recommendations with window_length greater than 1. The dummy code below roughly shows what I'm trying to achieve. Where should the factor slicing occur when using awindow_length greater than 1?

def make_pipeline():  
    class Slice_Factor(CustomFactor):

        window_length = 4

        def compute(self, today, assets, out, fred, close):  
            out[:] = close[-1] - fred[symbols('SPY')][-1]  


    my_slice_factor = Slice_Factor([fred_3mt.usd3mtd156n, USEquityPricing.close])

    pipe = pipe = Pipeline(  
        columns = {'my_slice_factor' : my_slice_factor},  
        screen = QTradableStocksUS()  
     )  
    return pipe

run_pipeline(make_pipeline(), '2003-05-05', '2003-06-05')

This returns the following error

IndexError: index 8554 is out of bounds for axis 0 with size 4

I also tried fred[-1][symbols('SPY')] but this returns a DataFrame populated with np.nan.

Dan Whitnable

Looking at the dummy code from above, there are a couple of issues. First, the inputs to factors should be slices or factors. So this would be more correct

    fred = fred_3mt.usd3mtd156n[symbols('SPY')]  
    my_slice_factor = Slice_Factor([fred, USEquityPricing.close])

Notice we now have a slice and a factor as inputs.
The second issue is the factor definition.

def make_pipeline():  
    class Slice_Factor(CustomFactor):

        window_length = 4

        def compute(self, today, assets, out, fred, close):  
            out[:] = close[-1] - fred[-1]

The calculations and output only depend upon the latest values of the inputs (ie [-1]) so the window length doesn't matter. Is it perhaps a better example to use the first value of the inputs (ie [0]).

def make_pipeline():  
    class Slice_Factor(CustomFactor):

        window_length = 4

        def compute(self, today, assets, out, fred, close):  
            out[:] = close[0] - fred[symbols('SPY')][0]

That will return the close 4 days ago plus the value of fred 4 days ago. Is that closer to the desired outcome (I know this may not be the exact calculation)? If so, read on...

One needs to add a custom factor as a 'helper' or 'workaround'. When using factors or slices as inputs to other factors, the inputted factors must be window_safe=True. This is simply a flag that means the values of the factor will be the same when calculated over various 'windows' or timeframes. Really this means its value won't be impacted by stock splits so it's 'safe' to use whether a split is applied or not.

As an example, a '10_day_moving_average_price' factor is not window_safe. If a 2:1 split occurs all the values will be cut in half. However, a '10_day_return' factor is. It's just a ratio which will remain the same even if the prices are halved.

It's up to the author of a factor to determine if a factor is 'window_safe'. The window_safe flag is just used in pipeline to throw an error if it finds it's using a factor in an 'un-safe' way and therefore may be giving incorrect results. Setting this to True simply instructs pipeline to not throw an error. It should be noted that maybe using a factor with unadjusted prices is ok. It just depends upon the situation.

Macro data by definition isn't impacted by stock splits so we are safe to set window_safe=True. This can be done with a simple custom factor which copies the input to the output but then, importantly, sets window_safe=True. Like this

class Window_Safe(CustomFactor):  
    # A factor to make a factor window_safe  
    window_length = 1

    # Tell pipeline that this is window_safe even though it may not really be?  
    window_safe = True

    def compute(self, today, assets, out, value):  
        out[:] = value

# This can be used like this  
my_slice = my_data[symbols('SPY')]  
my_window_safe_slice = Window_Safe(inputs=[my_slice])

new_factor = Some_Factor(inputs=[my_window_safe_slice], window_length=4)

That is all a bit general. See the attached notebook for specifically how to use factors and slices as inputs to another factor with window_length greater than 1. Maybe also take a look at this post https://www.quantopian.com/posts/get-lagged-output-of-pipeline-custom-factor.

Hope that helps.

Disclaimer

Joseph Moorhouse

Perfect, that all works now! For reference, this is the solution.

def make_pipeline():  
    class Window_Safe(CustomFactor):  
        # A factor to make a factor window_safe  
        window_length = 1

        # Tell pipeline that this is window_safe even though it may not really be?  
        window_safe = True

        def compute(self, today, assets, out, value):  
            out[:] = value  

    # One can also use slices as inputs to factors including custom factors  
    # Here we pass our slice and the price as inputs to a custom factor  
    my_slice = fred_3mt.usd3mtd156n.latest[symbols('SPY')]  
    my_window_safe_slice = Window_Safe(inputs=[my_slice])  
    # Create an example of a factor which uses both a slice and a factor as inputs  
    # The input should be a sliceclass Slice_Factor(CustomFactor):  
    class Slice_Factor(CustomFactor):

        window_length = 4

        def compute(self, today, assets, out, fred, close):  
            # arbitrary calculation  
            out[:] = close[-1] - fred[-1]  

    my_slice_factor = Slice_Factor([my_window_safe_slice, USEquityPricing.close])  
    pipe = pipe = Pipeline(  
        columns = {'my_slice_factor' : my_slice_factor,  
                   'close': USEquityPricing.close.latest},  
        screen = QTradableStocksUS()  
     )  
    return pipe

run_pipeline(make_pipeline(), '2003-05-05', '2003-06-05')

I included the close column to manually check the daily constant (fred_3mt.usd3mtd156n) was being subtracted .

Thanks Dan, really appreciate your support!

Dan Whitnable

That looks great! Glad to help. I can certainly attest that some of these concepts (like slicing and window_safe and passing factors as inputs to other factors) aren't obvious at first. It really takes a concrete application, and a bit of perseverance, to put it all together. Kudos for sticking in there. From my experience next time will be easier.

Disclaimer

Joseph Moorhouse

Yeah exactly! I will make sure to keep practising with different datasets. Thanks again and have a good weekend!

Joseph

Shuang Liang

I use similar code as mentioned above. It works in notebook but in IDE backtesting, it got an error:
zipline.pipeline.term.getitem() expected a value of type zipline.assets._assets.Asset for argument 'key', but got list instead

from quantopian.pipeline.data.user_5c879f945024c000453208d8 import payems
This is the line that gets the error:
employ = payems.payems.latest[symbols('SPY')]

after changing symbols to symbol, it works in IDE

Terence Sia

Hi all,

I have uploaded my custom dataset which has 2 macro time series and used the following code to call the data...

 model = assetallocationmodel.signal[symbols('SPY')]

but got the error:

NonSliceableTerm: Taking slices of 5f3b65112521950001bea770.signal::float64 is not currently supported.

As it is a macro overlay, I would like use the 2 time series outside of pipeline and am just struggling to call the data. Does anyone know a work-around?

You've successfully submitted a support ticket.

Our support team will be in touch soon.