Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Screen vs filter

Sorry complete newbie so apologies for the simple question. I would like to create a simple backtest of a long term buy-hold strategy in which a screen for some key ratios is applied (eg current ratio), in addition to price movements (eg off 50% from a 52 week high), and some factor restrictions (eg min price and/or volume). Then the top 20-50 stocks are purchased and held until a particular % gain is achieved or stop lossed out. new stocks are purchased when a new one meets the top 20 list and there is enough cash (although I'd be happy to ignore this last part and just complete one round!).

Just for starters, I don't understand whether you use pipeline for this or the get_fundementals. And then why do you use a filter() rather than a screen? seems like it does the same thing?

Sorry for my confusion and thanks for your help

10 responses

Welcome!

I'd suggest always using pipeline to fetch all data (fundamentals, price, volume, and any of the other available datasets https://www.quantopian.com/data) . The 'get_fundamentals' function was the original method to retrieve data but has been generalized (and optimized) with the pipeline methods. The limitation is that pipeline accesses previous day data. For intra-day pricing and volume use the 'get_current' or 'get_history' methods.

As for the difference between 'screen' and 'filter'... It may help to get a solid mental picture of what a pipeline returns. It returns a 2 dimensioned pandas dataframe. The rows are securities. The columns are the values for any factors, filters, or classifiers you create and add to the pipeline. It's really just a big spreadsheet of data with columns of data that you specify when defining the pipeline.

The columns must be factors, filters, or classifiers. The difference is really just the type of data they represent. Factors are real numbers. Filters are boolean True or False. Classifiers are sets or integers. So, a filter is really just a column of data that has a value of True or False. However, filters are objects which can also be used in other places. Namely they can be used as a 'mask' when creating factors and classifiers, and they can be used as a 'screen' when creating a pipe. In each case they have the same effect. Namely, they limit the securities returned to a smaller subset (those which the filter returns True). A filter is an real object you create. The terms 'mask' and 'screen' are just names for places where one would want to use the filter.

Why use filters as a mask? Primarily to reduce computation speed and memory usage. A factor or classifier will only perform it's calculations on securities passing the filter (ie True). Why use filters as a pipeline screen? Primarily as a convenience so the returned dataframe (the big spreadsheet of data returned by the pipeline) doesn't have so many rows and only returns a sub-set of securities. It's just a convenience because one can always sort and filter the dataframe AFTER it's returned. Just use any of the pandas sort and filter methods on the output.

Hope that helps.

Why use filters as a mask? Primarily to reduce computation speed and memory usage.

Also, if the computation is such that it needs to be performed only on a limited set of securities (e.g. within the factor, an optimization or normalization is performed using only the universe of interest).

Great - thank you for the guidance. So just to be clear, when you say the pipeline only accesses previous day data, how do you backtest? In other words, I would like to run my screen say, once per month and backtest that starting a few years ago.
Thanks!

Pipeline works in both the research platform and the backtester. You're gonna have to get up the learning curve. Here's the tutorial:

https://www.quantopian.com/tutorials/pipeline

I attached the backtest from tutorial. As you can see, pipeline is integrated with the backtester.

I suggest taking some time to study up, and then maybe tweak the example, adding some functionality of interest. You'll be better off if you post questions with a working example that folks can copy/clone and modify, so if you focus on general functionality versus your "secret sauce" then you can share. In the end, you should be able to implement your strategy.

Grant is correct. You can do what you want. There is a bit of a learning curve though.

When I said 'the pipeline only accesses the previous days data' this is always relative to the backtest day, and not the day it starts or anything like that.

The pipe should be defined in the initialize function. This function is automatically called exactly once before the backtest starts. Create all your factors, filters, etc in this function (or in a separate function which is called from there). Then instantiate the pipeline similar to below

universe = Q1500US()

close = USEquityPricing.close.latest

my_pipe =  Pipeline(  
        columns = {  
        'close' : close,  
          },  
        screen = universe,  
        )  

All this does, however, is create the definition for what data you want as columns in the dataframe which the pipeline returns. To actually run the pipeline and get that data use the 'pipeline_output' method. Place this in the before_trading_start function and it will automatically get called each day before the markets open. It will return all the current data as of the previous day (ie everything that a trader would know before the market opens).

context.output = pipeline_output('my_pipe')

context.output will be a dataframe with all the columns of data you defined for each security which passed the 'screen' filter (or all securities if no screen is specified). The backtest engine executes the before_trading_start function for you. The data returned from the pipeline will be refreshed each time its run (ie every day).

Typically, if you wanted to run a screen every month then schedule a function to run every month (see https://www.quantopian.com/help#sample-schedule-function). Within that function you can access the context.output dataframe which has all your data. It will be current as of that day of the backtest and have the data as it would have looked on that day back in history (the backtester is designed to eliminate look ahead bias). Technically the pipeline will be running every day and fetching data which you won't be using. You could code some logic to only run the pipeline on certain days but I wouldn't worry about it unless your backtest runs very slow.

To actually run the backtest just enter the start and end date in the backtest screen (see https://www.quantopian.com/help#backtests). The backtest engine works in conjunction with the pipeline definition to get the correct data as of the backtest date.

Hope that makes sense.

Ok thanks I think i got it. I have gone through many of the tutorials and videos (here and on youtube and python in general), but some of these timing and function questions have still been unclear to me.

I will post some sample code when i have a reasonable cut at it for some feedback. thanks for all the help!

You could code some logic to only run the pipeline on certain days but I wouldn't worry about it unless your backtest runs very slow.

I recall looking into this at one point, and finding out that you can't actually force the pipeline not to run less frequently than daily (unless my memory is fuzzy on this minor issue).

Grant you're correct (again). It won't improve speed by simply calling 'pipeline_out' less often. My mistake.

Why would you ever want to use a Screen instead of adding the filter at the pre-computation Mask end?

Hi Caleb,

Masks and screens are both composed of filters but are used differently: screens are only used within a pipeline to cut the number of outputted rows by omitting securities that don’t meet the criteria of the screen’s filters. Masks (read more about them here) limit the output to rows that meet the criteria of the screen’s filters and are applied in the computation phase of pipeline. Masks can be used to specify a subset of the universe to use in a computation whereas screens are just a convenience feature that are used after a pipeline has been computed.

Best,
Robert

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.