Executing a pipeline on a fetched universe expands the universe

Back to Community

posted Oct 18, 2015

Hi. total newbie here, so probably doing something obviously wrong.

I am importing a predefined universe via fetch, then attempting to filter down that universe based on some signals. when i apply the pipeline, the size of the universe grows. is there a way to apply the filter to just the existing universe? I'm having some issues with fetch as well, so this is just a working sample with dummy data.

def initialize(context):  
    #set universe to the contents of our file  
    fetch_csv(  
        'https://dl.dropboxusercontent.com/u/169032081/SP500.csv',  
        date_column = 'date', universe_func=my_universe, date_format = '%M/%D/%Y')  
    #setup filter  
    pipe = Pipeline()  
    pipe = attach_pipeline(pipe, name='my_pipeline')  
    sma_1 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=1)  
    sma_8 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=8)  
    sma_20 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=20)  
    dollar_volume = AvgDailyDollarVolumeTraded(window_length=7)  
    pipe.add(sma_1, 'sma_1')  
    pipe.add(sma_8, 'sma_8')  
    pipe.add(sma_20, 'sma_20')  
    pipe.add(dollar_volume, 'dollar_volume')  
    pipe.set_screen( (sma_8>sma_20) & (sma_1>sma_20) & (dollar_volume>400000.0) )  
# my_universe returns a set of securities that define your universe passed to the fetcher function  
def my_universe(context, fetcher_data):  
    # limit universe to securities that meet our uptrend filter  
    # set my_stocks to be every security in the fetcher_data  
    my_stocks = set(fetcher_data['sid'])  
    return my_stocks  
# The handle_data function is run every bar.  
def handle_data(context,data):  
    pass

def before_trading_start(context, data):  
    if len(data) > 0:  
        # limit universe to securities that meet our filter  
        log.debug('pre-filter universe size {c}'.format(c=len(data)))  
        sids = pipeline_output('my_pipeline')  
        log.debug('post-filter universe size {c}'.format(c=len(sids)))  
        update_universe(sids)  
    record(universe_size = len(data))

logs show:
2013-10-02before_trading_start:56DEBUGpre-filter universe size 494
2013-10-02before_trading_start:58DEBUGpost-filter universe size 3223

any ideas? the call to update_universe(sids) is also failing, which i suspect is related

8 responses

Karen Rubin

Oct 19, 2015

Hi Ethan,
These two lines of code,

sids = pipeline_output('my_pipeline')  
update_universe(sids)

Are updating your universe with everything that is returned by the pipeline.

Everything returned by the pipeline is all securities (8000+ on any given day) that pass your screen.
pipe.set_screen( (sma_8>sma_20) & (sma_1>sma_20) & (dollar_volume>400000.0) )

It also looks like fetch_csv is updating your universe with additional securities (which may or may not be in the pipeline list).

Can I ask why you are pre-defining your universe as opposed to using the pipeline to get the values for all securities in the universe and then filtering down from there? A couple of people have asked for this since we launched the API, and I am trying to understand why you would want to manually set the universe as opposed to using the pipeline functionality to do this.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Ethan Smith

Oct 19, 2015

Hi Karen,

I'm testing an external news event signal that i've been working on. being news based, the signal is per ticker, not a general long/short signal. basic research has shown that this signal correlates to prices moves anywhere from hours to months after the trigger event. buying the entire universe yield positive results, but not alpha. I want to try narrow down my event trigger universe to stocks that are trending up

from your docs it looked to me like I could use fetch_csv to define my universe on a daily basis for backtesting. the pipeline seemed like it would be a good way to the further filter that down. It appears that the data set used by the pipeline also define the universe. it would be quite valuable to allow those constructs to be separated.

I think for now, I will try to figure out some way to determine the intersection between my universe and the filter results and then use that for subsequent processing.

Karen Rubin

Oct 19, 2015

Thanks for the explanation! This is great detail.

I think I would import your list using fetch_csv, but not set it as your algo universe.

Then calculate the factors you want using the pipeline don't worry about narrowing down the universe, you can do the calculations for 8000+ securities in the pipeline.

Then in before_trading_starts get the output of the pipeline and use pandas to reduce the output dataframe to the list of SIDs you imported in fetch_CSV.

Disclaimer

Ethan Smith

Oct 20, 2015

Hi Karen,

thanks again for following up. just a few more points:

since my signaled tickers change every day (new ones added, old ones fall off), I need a way to process the specific tickers that were valid on a given day in the back test. since fetch_csv is called once during initialize, I don't see how I could get just the subset of tickers that are valid on a given day without using the universe parameter to set fetch_csv could be called in before_trading_starts and it returned a dataframe, I could make sense of that, but not
with the current implementation
fetch_csv timeout. given my scenario (a different set of tickers for each day), my input file is very large and runs into timeout issues. I understand the need to have timeouts and keep this optimized, but... there have to be better ways to do this. maybe support a "valid for x number of days" column for each start date, or an end date. happy to work with someone on this offline if that would help
do you have any examples/samples of matching up the output of the dataframe returned from pipeline_output against a universe? I tried so many variants of this and nothing seems to work:

    longs = {}  
    if len(data) > 0 :  
        filter_results = pipeline_output('my_pipeline')

        for f_sid in filter_results.index.values :  
            if f_sid in data:  
                longs[f_sid] = data[f_sid]

    context.long_list = longs  
    record(Longs = len(context.long_list))

I realize this is probably more of a datframe syntax question, but matching up keys should not be this hard. pointer to an existing working example would be fine

thanks for your help

Jamie McCorriston

Oct 22, 2015

Hi Ethan,

To use Fetcher in conjunction with Pipeline, the Fetcher securities need to be handled in handle_data or in a scheduled function. The reason for this is that Fetcher data is not injected into the universe until after before_trading_start. The tricky part here is that all sids in your universe (whether they came from pipeline or fetcher) are added to data. The way to distinguish which securities came from fetcher and which ones did not is to add a column in the fetcher file as a 'tag'. If you add a column like 'tag' and give it some arbitrary value, securities that come from Fetcher will have this column as an added property in the data variable.

I've attached a backtest which is a copy of the pipeline mean reversion strategy with the addition of a fetched .csv file. If you take a look at lines 126-129, this is where I look for an instance of my fetcher data in the pipeline output. I created a column called 'meta' in the .csv file and check each element in data to see if they have the 'meta' property. If so, it was in my fetcher file.

I believe this should help you get what you want!

Regarding the fetch_csv timeout, the fetcher file is read in once at the start of your algorithm, so the timeout is related to how long it takes to read in the file. Unless I'm misunderstanding, I don't think that a "valid for x number of days" column would actually help!

Let me know if you have any questions.

Disclaimer

Adam Luciano

Sep 9, 2016

Can one use fether data in pipeline? If so, how would one approach?

My thinking is I would use fetcher to pull in a dataset, create a multi-index dataframe (the data has multiple dimensions), and then use the dataframe as a an input to a custom factor, and then add to pipeline - would that work?

Adam Luciano

Sep 10, 2016

I think I could use the proposed notebook attached, but running into an issue with data.current in Research, saying 'data' is not defined. How would one reference fetched data from within a customfactor's input? What would be the proper syntax?

Adam Luciano

Sep 11, 2016

I am think I am getting closer, but now get the following error: TypeError: zipline.pipeline.pipeline.add() expected a value of type zipline.pipeline.term.Term for argument 'term', but got abc.ABCMeta instead in research.

How do I adjust a local csv data file, so pipeline can add to it's index? What elements are needed?

You've successfully submitted a support ticket.

Our support team will be in touch soon.