Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
PLEASE HELP!

How do I create a universe of every stock? I would like to initialize all stocks then narrow the universe to 25% gainers.

I am having trouble figuring out how the definitions work. I am decent with python but the passing and updating times don't flow logically for me here. I think just to many things are happening "behind the scenes" for me to follow well.

I appreciate any help thank you!

4 responses

First off, welcome!

Here is a short overview of the flow of a Quantopian algorithm which may help. The three big 'behind the scenes' things to be aware of are:

  • The 'framework'. When you 'build algorithm' or 'run full backtest' or launch it for paper trading or live trading, you are really handing it over to overarching program which does some things. Those 'things' are explained below.
  • The 'pipeline' object. This is really just an object which is defined in an open source library (https://github.com/quantopian/zipline/tree/master/zipline/pipeline) that takes care of executing the actual daily (not minutely) data queries for you. It is also optimized for backtesting so it 'pre-processes' the data for speed. Instead of doing any direct database or file queries you simply define the data you want in a pipeline definition, then run the pipeline. It will output a nice Pandas dataframe with all your data. Maybe check this post out https://www.quantopian.com/posts/custom-factor-calculation-over-iterating-help.
  • All the 'built in' objects for factors and filters and order functions etc.. These again are all open sourced and can easily be imported into your algorithm. Read through documentation https://www.quantopian.com/help.

Here's the general 'flow' of an algorithm when you run the program for either a backtest or live...

  1. Anything not in a function is run once. This should really be only any imports your program needs and possibly the setting of any 'constants' your program may use. All of your logic should be inside of any functions you define.

  2. Your initialize function is called exactly once. This is typically where the pipeline is defined and any of your functions that need to be handled periodically are scheduled (using the 'schedule' function). Don't generally put any trading 'logic' here. It must be called 'initialize'.

  3. Your before_trading_start function is called every trading day before markets are open (and after all the Quantopian data feeds are updated). This is typically where the pipeline is run and the output is stored so the pipeline dataframe can be used throughout your algorithm. It must be called 'before_trading_start'

  4. Your handle_data function is called every minute. Put anything you need to update every minute here. Many programs however, do not need to check things that often and therefore do not have 'handle_data' function even defined. It must be called 'handle_data'.

  5. Your functions that were scheduled using schedule_functionare run at their pre-defined schedules. This is where the bulk of your logic resides. These can in turn call other functions if needed and/or to make your logic more readable.

So... to answer your specific questions:

How do I create a universe of every stock? This is easy. Use pipeline. The output (specifically the index) will contain ALL securities that Quantopian tracks. Note that these are common stocks, preferred stocks, ETNs, ETFs, etc. You, should really filter this down to some initial sub-set. One of the pre-defined universe filters such as Q1500US would get you the most tradable stocks for instance.

I would like to initialize all stocks then narrow the universe to 25% gainers Again, use pipeline. Create an initial filter to get only stocks, create a factor for 'gainers' (ie returns), then use the built in method '.percentile_between' to get the top 25%.

import quantopian.pipeline.filters as Filters  
import quantopian.pipeline.factors as Factors

# Built in filter to exclude ETFs etc  
is_stock = Filters.IsPrimaryShare()

# Create a factor for gains  
gains = Factors.Returns((inputs=[USEquityPricing.close], window_length=2, mask = is_stock)

# Filter to get only the top 25% stocks with highest gains .  
top_25_percent_gainers = gains.percentile_between(75, 100, mask=is_stock)

Attached is an algorithm which does just this (though it uses Q1500US for the universe of stocks). It may help getting started. Do look at the tutorials and the help docs. You may also want to look at these other posts:

more overview on what pipeline is all about
https://www.quantopian.com/posts/screen-vs-filter

a bit more about how pipeline works and is optimized
https://www.quantopian.com/posts/custom-factor-calculation-over-iterating-help

links to some good tutorials
https://www.quantopian.com/posts/quantopian-2-dot-0-tutorial-series

How to use top, bottom, percentile_between methods on pipeline_output?

from quantopian.pipeline  import Pipeline  
from quantopian.algorithm import attach_pipeline, pipeline_output  
import quantopian.pipeline.factors as Factors  
import quantopian.pipeline.filters as Filters

def initialize(context):  
    attach_pipeline(pipeline(context), 'pipeline')  
    schedule_function(rebalance, date_rules.month_start(), time_rules.market_open(minutes = 65))

def pipeline(context):  
    universe  = Filters.QTradableStocksUS()  
    mkt_cap   = Factors.MarketCap(mask = universe)  
    my_screen = mkt_cap.top(100)  
    pipe      = Pipeline(columns = {'mkt_cap':mkt_cap, }, screen = my_screen)  
    return pipe

def rebalance(context, data):  
    mkt_cap_sorted = pipeline_output('pipeline').sort_values('mkt_cap', ascending = True)  
    longs = mkt_cap_sorted.tail(10)  
    # longs = mkt_cap_sorted.top(10)  
    # longs = mkt_cap_sorted.buttom(10)  
    # longs = mkt_cap_sorted.percentile_between(90, 100)  
    print longs  

Note that 'top', 'bottom', and 'percentile_between' are pipeline factor methods. They need to be applied to a factor object and then will return a pipeline filter object. Factor and filter definitions are done exactly once in an algorithm (typically in the 'initialize' method or a method called inside 'initialize').

So, the code below doesn't work

def rebalance(context, data):  
    mkt_cap_sorted = pipeline_output('pipeline').sort_values('mkt_cap', ascending = True)  
    longs = mkt_cap_sorted.tail(10)  
    longs = mkt_cap_sorted.top(10)  
    longs = mkt_cap_sorted.buttom(10)  
    longs = mkt_cap_sorted.percentile_between(90, 100) 

First, 'mkt_cap' is a pandas dataframe (not a factor object) so it doesn't recognize the 'top', 'bottom', and 'percentile_between' methods. Second, these need to be defined before the 'pipeline_output' is called (typically in 'initialize') .

Put these methods in the pipeline definition. Something like this.

def pipeline(context):  
    universe = Filters.QTradableStocksUS()  
    mkt_cap = Factors.MarketCap(mask = universe)  
    top_10 = mkt_cap.top(10)  
    bottom_10 = mkt_cap_sorted.bottom(10)  
    between_90_100 = mkt_cap.percentile_between(90, 100) 


    pipe = Pipeline(columns = {  
        'mkt_cap' : mkt_cap,  
        'top_10' : top_10,  
        'bottom_10' : bottom_10,  
        'between_90_100' : between_90_100,  
          })  


    return pipe

To then use these when getting the pipeline output (which is usually done in the 'before_trading_start' method), do something like this

def before_trading_start(context, data):  
    output = pipeline_output('pipeline')  
    top_10_list = output.query('top_10').index.tolist()  
    bottom_10_list = output.query('bottom_10').index.tolist()  
    between_90_100_list = output.query('between_90_100').index.tolist()

Now, one can also use plain old pandas dataframe methods (instead of defining pipeline filters) to get the top and bottom by using the 'nlargest' and 'nsmallest' methods (pandas doesn't have a neat alternative to percent_between)

def before_trading_start(context, data):  
    output = pipeline_output('pipeline')  
    top_10_list = output.nlargest(10, 'mkt_cap').index.tolist()  
    bottom_10_list = output.nsmallest(10, 'mkt_cap').index.tolist()

Hope that helps...

Dan,

Thank you very much.