How to create a Pipeline that removes delisted companies, both in notebooks and in algorithms?

Back to Community

posted

I am in the process of creating an algorithm, that based on your pipepline, finds and creates pairs, and some other additional features. I am currently doing this on Notebooks. However, now that I am running it, I see that halfway through that some companies dont have any prices, as they are delisted. Or that they are delisted and a new company has taken the previous ticker. So how would you take that into consideration in your Pipeline?

6 responses

Jamie McCorriston

Hi Cemal,

Could you share a notebook in this thread that demonstrates the issue you are seeing? The expected behavior of the Pipeline API is that it should only be outputting assets that are currently listed on a supported exchange (e.g. NYSE/Nasdaq). If an asset goes delisted, I would not expect to see it in a Pipeline output. If you have a notebook that demonstrates the issue, we can investigate and see if there's a bug.

Thanks,
Jamie

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

cemal arican

Hi Jamie,

Yeah sure, it could be that I oversaw something, I can share my notebook with you

# 1 - Getting pipeline and returns a multi-index dataframe:  
def make_pipeline():  
    avd_10 = AverageDollarVolume(window_length=10)  
    avd_30 = AverageDollarVolume(window_length=30)  
    # filter for the highest # and now we are masking it  
    high_dollar_volume = avd_30.percentile_between(90, 100)  
    sma_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10, mask=high_dollar_volume)  # factor  
    sma_30 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=30, mask=high_dollar_volume)  # factor  
    percent_diff = (sma_10 - sma_30)/sma_30  # factor  
    latest_close = USEquityPricing.close.latest  
    volume = USEquityPricing.volume.latest  
    # mean_crossover_filter = sma_10 < sma_30  
    return Pipeline(  
            columns={  
                'sma_10': sma_10,  
                'sma_30': sma_30,  
                'percent_diff': percent_diff,  
                'avd_10': avd_10,  
                'avd_30': avd_30,  
                'latest_close': latest_close,  
                'volume': volume  
            }  ,screen=high_dollar_volume  
    )

# 2 - Running the pipeline, get 1 year data

pipe_line = run_pipeline(make_pipeline(),'2018-01-01', '2019-01-01')  
pipe_line

# 3 - unstacking to get the latest close prices, note that this already gives NaN for certain assets. The data frame has 252 rows (# of trading days) and 1299  
# columns (# of stocks)  
pipe_line_latest_close = pipe_line.latest_close.unstack(level=-1)  
pipe_line_latest_close

my idea was to add .dropna(axis=1) to get rid of these stocks that have NaN. What do you think?

cemal arican

Its strange because I get a lot of stocks with NaN, but when I use the get_pricing() function it prints everything

Jamie McCorriston

Hi Cemal,

Can you share your notebook by 'attaching' it to a comment in the thread? I'd like to make sure we're looking at the exact same code!

Disclaimer

cemal arican

here it is. If you remove the .dropna(axis=1) the you see in the correlation matrix that it has some NaN

Jamie McCorriston

Hi Cemal,

The reason you're seeing NaNs is because not all of the stocks that appear in your pipeline output traded on a major exchange for the entire 2018 calendar year. When you call unstack(), you're forcing the dataframe to provide a value for every trading day. If an equity wasn't trading on that day, it will be filled with a NaN. A good example of this is Equity(52211 [TLRY]), which started trading on a major exchange in July 2018. Dropping NaNs sounds like a reasonable tactic to manage the issue. That said, you should consider using daily returns Returns(window_length=2) instead of daily prices. If you get 'latest' pricing data from pipeline, the prices you get in the pipeline output will be adjusted as of the simulation date each day, which I don't think is what you want when computing correlations. Using returns instead of prices will solve the adjustment problem.

To learn more about how pipeline adjusts for corporate actions, see this section of the Data Reference.

Disclaimer

You've successfully submitted a support ticket.

Our support team will be in touch soon.