Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Implementing cap weighting within pipeline?

Hi,

I'm just beginning to familiarize myself with the Quantopian API and am trying to implement a simple cap-weighted index (i.e. a quasi S&P500 index). At some point in my algorithm I need to sum the constituent market caps and divide to get my weightings. I'm trying to figure out where is the appropriate place to do that and how; whether I should calculate the weights in initialize() via a factor, or whether I should calculate the weights in a scheduled rebalance(). I apologize if this has already been answered, but the pipeline API seems fairly new and of the examples I've found so far, all have used equal weightings.

If I've setup my pipeline in initialize() like so, screening for the top 500, is the pipeline data actually "screened" at this point?

def initialize(context):  
    pipe = Pipeline()  
    attach_pipeline(pipe, name = 'pipeMktCap')  
    # add market cap custom factor  
    mktCap = MarketCap()  
    pipe.add(mktCap, 'mktCap')  
    # filter for the 500 largest by market cap  
    pipe.set_screen(mktCap.top(500))  

Could/should I just do something like this to get the sum and then add a factor with the weights?

totalMktCap = pipe.sum('mktSum')  
mktWgt = pipe['mktCap'] / totalMktCap  
pipe.add(mktWgt, 'mktWgt')  

Or would I need to apply a "mask" to the pipe.sum method? I don't really understand what the pipe is, i.e. what it contains, when still in initialize(). Once you've accessed the pipeline in some other function like before_trading_start() via pipeline_output() it's a Pandas data frame.

Alternatively, I could calculate the weights in a scheduled rebalance function which seems like the more appropriate place to do this. If so, where do I need to access the pipeline via pipeline_output? For example, do I need to do it in before_trading_start() or could I just put this in my rebalance(). As an aside, could I output from the pipeline in both places and should I always get the some output?

I've obviously got a lot of questions regarding details of the pipeline, but to begin, I'm just trying to figure out the appropriate method of calculating these weights.

Thanks.

3 responses

Hi James,

Great questions. The initialize() function is where you design your pipeline, and before_trading_start() is where you get the output. This is because your pipeline is updated daily while handle_data can be run minutely. I think what you will want to do here is make the sum calculation in before_trading start and pass it along with whatever you want from your pipeline out into the context variable. This is a good example where the results of the pipeline that I'm interested in are passed to context each day in before_trading_start. This should help you get started.

Let me know if this helps!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I couldn't find pipeline.sum, how did you sum the market caps to calculate the weights?

The calculation is done on the dataframe returned from the pipeline. 'sum' is a pandas method to sum either rows or columns of a dataframe (see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sum.html).

Something like this will sum the market caps (assuming 'mktCap' is a column in the returned pipeline)

context.output = pipeline_output('my_pipe')  
mkt_cap_sum = context.output['mktCap'].sum()