Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
issue rebalancing with TargetWeights and percentile_between with mask

My goal is to have an equal amount in each short and in each long while maintaining dollar neutral.

I am going long percentile between (0, 20) and short (80, 100).

However, I have a mask on percentile between (0, 20) - line 159

Because of the mask I have a different number of shorts and longs - I haven't been able to accomplish my goal listed above with this different number of shorts & longs, i.e., my long positions will be a slightly higher dollar amount than my short positions because there are more stocks in percentile_between (80, 100) because that one doesn't have a mask.

I would also like the long positions to all be the same dollar amount and the short positions to all be the same dollar amount, even though a long position and short position dollar amount will not be the same.

Thanks all.

3 responses

Is there a way that I would be able to run that with my own user id replacement for user_53bc0c4831bb052ad000002f ?
Meanwhile, I couldn't test this but maybe some of it can help.
If there are errors, try to work around them, then maybe paste the output from log_data.

import numpy as np  
from quantopian.pipeline  import Pipeline  
from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline  import CustomFactor  
from quantopian.pipeline.data import Fundamentals  
from quantopian.pipeline.data.psychsignal import stocktwits  
from quantopian.pipeline.data.user_53bc0c4831bb052ad000002f import it24up11619  
from quantopian.pipeline.experimental import risk_loading_pipeline  
from quantopian.pipeline.factors import Returns, SimpleMovingAverage  
from quantopian.pipeline.filters import QTradableStocksUS, StaticAssets, Q500US  
import quantopian.algorithm as algo  
import quantopian.optimize  as opt

def initialize(context):  
    algo.attach_pipeline(make_pipeline(), 'fundamentals_pipeline')  
    schedule_function(trade, date_rules.week_start(), time_rules.mrkt_open(minutes=30))

def make_pipeline():  
    class DebtChange(CustomFactor):  
        inputs = [Fundamentals.total_debt]  
        window_length = 252  
        def compute(self, today, asset_ids, out, debt_change):  
            out[:] = debt_change[-1] - debt_change[0]

    class DiffInEroc(CustomFactor):  
        inputs = [it24up11619.eroc]  
        window_length = 126  
        def compute(self, today, asset_ids, out, eroc):  
            out[:] = (eroc[0] - eroc[-1]) * (-100)

    m = QTradableStocksUS()  # initial mask, then iterative, progressively adding to it

    eez  = it24up11619.eez .latest  
    eroc = it24up11619.eroc.latest  
    mrkt = Fundamentals.mrkt_cap.latest  
    m   &= eroc.notnull()           # adding to mask  
    m   &= eez .notnull()  
    m   &= mrkt.notnull()  
    m   &= mrkt > 0

    debtchange = DebtChange(mask=m)  
    dc_over_mc = debtchange / mrkt  
    #m &= dc_over_mc < .03

    #boteez = eez.percentile_between(0,   20, mask=m)  
    #topeez = eez.percentile_between(80, 100, mask=m)  
    #m &= (boteez | topeez)

    # ~ (tilde) means not, so this is like 0-20 and also 80-100  
    limit = 20  
    m &= ~eez.percentile_between(limit, 100-limit, mask=m)

    return Pipeline(  
        screen = m,  
        columns = {  
            #'boteez': boteez,  
            #'topeez': topeez,  
            'alpha' : eez * -1,   # -1 flips long, short  
            #'eez'   : eez,  
            'eroc'  : eroc,  
            'mrkt'  : mrkt,  
            'dc_over_mc':dc_over_mc,  
            'debtchange':debtchange,  
        }  
    )

def before_trading_start(context, data):  
    context.out = algo.pipeline_output('fundamentals_pipeline')  
    results = context.out  
    print results.tail()  
    print results.head()  
    context.alpha = context.out.alpha  
    context.alpha = norm(context.copy().alpha)  # try with and without this line

    # https://www.quantopian.com/posts/pipeline-preview-overview-of-pipeline-content-easy-to-add-to-your-backtest  
    if 'log_data_done' not in context:    # show values once  
        log_data(context, data, context.out, 4)  # all fields (columns) if unspecified

def trade(context, data):  
    conc = 0.5 / len(context.out)               # .01

    algo.order_optimal_portfolio(  
        objective   = opt.TargetWeights(context.alpha),  
        constraints = [  
            opt.MaxGrossExposure(1),  
            opt.DollarNeutral(),  
            opt.PositionConcentration.with_equal_bounds(min = -conc, max = conc),  
            #opt.PositionConcentration.with_equal_bounds(min=-.01, max=.1),  
        ]  
    )

def norm(c, d):    # d data, it's a series, normalize it pos, neg separately  
    # https://www.quantopian.com/posts/normalizing-positive-and-negative-values-separately  
    # Normalizing positive and negative values separately, recombining for input to optimize.  
    # Debated whether to include this part. If all pos or neg, shift for pos & neg.  
    if d.min() >= 0 or d.max() <= 0:  
        d -= d.mean()  
    pos  = d[ d > 0 ]  
    neg  = d[ d < 0 ]

    # same number of stocks for positive & negative  
    num  = min(len(pos), len(neg))  
    pos  = pos.sort_values(ascending=False).head(num)  
    neg  = neg.sort_values(ascending=False).tail(num)

    pos /=   pos.sum()  
    neg  = -(neg / neg.sum())  
    return pos.append(neg)

def log_data(context, data, z, num, fields=None):  
    ''' Log info about pipeline output or, z can be any DataFrame or Series  
    https://www.quantopian.com/posts/overview-of-pipeline-content-easy-to-add-to-your-backtest  
    '''  
    if 'log_init_done' not in context:  
        log.info('${:,}    {} to {}'.format(int(context.portfolio.starting_cash),  
                get_environment('start').date(), get_environment('end').date()))  
        context.log_data_done = 1

    if not len(z):  
        log.info('Empty')  
        return

    # Options  
    log_nan_only = 0          # Only log if nans are present  
    show_sectors = 0          # If sectors, do you want to see them or not  
    show_sorted_details = 1   # [num] high & low securities sorted, each column  
    padmax = 6                # num characters for each field, starting point

    # Series ......  
    if 'Series' in str(type(z)):    # is Series, not DataFrame  
        nan_count = len(z[z != z])  
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''  
        if (log_nan_only and nan_count) or not log_nan_only:  
            pad = max( padmax, len('%.5f' % z.max()) )  
            log.info('{}{}{}   Series  len {}'.format('min'.rjust(pad+5),  
                'mean'.rjust(pad+5), 'max'.rjust(pad+5), len(z)))  
            log.info('{}{}{} {}'.format(  
                ('%.5f' % z.min()) .rjust(pad+5),  
                ('%.5f' % z.mean()).rjust(pad+5),  
                ('%.5f' % z.max()) .rjust(pad+5),  
                nan_count  
            ))  
            log.info('High\n{}'.format(z.sort_values(ascending=False).head(num)))  
            log.info('Low\n{}' .format(z.sort_values(ascending=False).tail(num)))  
        return

    # DataFrame ......  
    content_min_max = [ ['','min','mean','max',''] ] ; content = ''  
    for col in z.columns:  
        try: z[col].max()  
        except: continue   # skip non-numeric  
        if col == 'sector' and not show_sectors: continue  
        nan_count = len(z[col][z[col] != z[col]])  
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''  
        padmax    = max( padmax, len(str(z[col].max())) )  
        content_min_max.append([col, str(z[col] .min()), str(z[col].mean()), str(z[col] .max()), nan_count])  
    if log_nan_only and nan_count or not log_nan_only:  
        content = 'Rows: {}  Columns: {}'.format(z.shape[0], z.shape[1])  
        if len(z.columns) == 1: content = 'Rows: {}'.format(z.shape[0])  
        paddings = [6 for i in range(4)]  
        for lst in content_min_max:    # set max lengths  
            i = 0  
            for val in lst[:4]:    # value in each sub-list  
                paddings[i] = max(paddings[i], len(str(val)))  
                i += 1  
        headr = content_min_max[0]  
        content += ('\n{}{}{}{}{}'.format(  
             headr[0] .rjust(paddings[0]),  
            (headr[1]).rjust(paddings[1]+5),  
            (headr[2]).rjust(paddings[2]+5),  
            (headr[3]).rjust(paddings[3]+5),  
            ''  
        ))  
        for lst in content_min_max[1:]:    # populate content using max lengths  
            content += ('\n{}{}{}{}     {}'.format(  
                lst[0].rjust(paddings[0]),  
                lst[1].rjust(paddings[1]+5),  
                lst[2].rjust(paddings[2]+5),  
                lst[3].rjust(paddings[3]+5),  
                lst[4],  
            ))  
        log.info(content)  
    if not show_sorted_details: return  
    if len(z.columns) == 1:     return     # skip detail if only 1 column  
    if fields == None: details = z.columns  
    for detail in details:  
        if detail == 'sector' and not show_sectors: continue  
        hi = z[details].sort_values(by=detail, ascending=False).head(num)  
        lo = z[details].sort_values(by=detail, ascending=False).tail(num)  
        content  = ''  
        content += ('_ _ _   {}   _ _ _'  .format(detail))  
        content += ('\n\t... {} highs\n{}'.format(detail, str(hi)))  
        content += ('\n\t... {} lows \n{}'.format(detail, str(lo)))  
        if log_nan_only and not len(lo[lo[detail] != lo[detail]]):  
            continue  # skip if no nans  
        log.info(content)  

This is amazing, thank you for the detailed help. This is beyond helpful. I'm still going through all your code and will be back.

I'd love for you to be able to do backtests with the data if you'd like. I don't know how the new collaborate function works in the IDE, but I'd love to add you so you can tinker, if that would do the trick. It says I need an email address though to collaborate, if you're ok with giving me. If collaborate doesn't work, I can send the file to upload as well.

FYI this data set is only for the S&P500 Tech Sector. I plan on moving to the entire 500 once I get the bugs worked out. The numbers in the Alphalens Notebook were quite good, IC Decay, quantile returns, etc. So looking forward to moving ahead.

click name -> send message