Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
When to use data.history vs pipeline?

Are there any guidelines around this? For example, if I was doing a simple moving average crossover strategy over the whole universe of stocks, would I use data.history in my scheduled function and create the moving averages there, or should I do it in the data pipeline?

Also, another thing, if I screen data in the pipeline and day 1 I go long a stock, but that stock gets screened out over the next days, how will the algo know to sell if there is no data for it (as it's screened out)?

ie if I screen for stocks over $5, then I buy at $6, then the stock goes under $5 and gets screened out.

2 responses

The main raison d'être of pipeline is to speed up the data fetches. Using the data.history method to fetch the past 20 days prices every day for 100 days results in 100 separate database calls. However, much of the historical data each day will be the same each day. One keeps fetching the same data over and over again. Pipeline improves this situation by 'chunking' the total timeframe (ie 100 days) into smaller chunks and may therefore only call the database a couple of times to get the first 50 and then the next 50 days of data. So, the guideline is if speed is important then use pipeline.

It's a problem with stocks falling out of the pipeline. I'd suggest NOT filtering any results in the pipeline. Do all the filter and select methods once the pipeline data is returned. That way data for any stocks currently held will be available regardless if they do not meet any current criteria.

I can give some examples if that's not entirely clear. Good luck.

Thanks Dan, that is very clear. I understand the performance aspect of the pipeline and i think its a great idea. I like the way zipline runs all symbols in parallel each day rather than each symbol sequentially (like quantrat R).

Soon i will be implementing some (real strategies) factors that i can rank on and use alphalens to find alpha and this is where the pipe will come in really handy, but for now i'm just playing around with some naive momentum strategies and using stock selection of the universe but I'm also wanting to put some constraints on the percentage of each position and leverage.

Here is the algo I've implemented so far, feel free to comment on anything I've done silly.
The strategy just tries to hold a basket of 200 stocks with each position 5%.
It looks for relative strength against the SPY (not properly implemented yet), absolute strength with SMA's, VIX under 25 and close price above $1.
It sells on absolute weakness (SMA's).