Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
New to Pipeline, Need a little help managing number of positions

Hi guys, I am trying to make an algorithm. I am using percentiles to see who has the highest retained earning, highest cash flow, MA50>MA200, and a z-score is being calculated from a custom factor. I am having trouble controlling the amount of stocks in the portfolio, i.e. sometimes the portfolio has +50 and other times <10.

My main questions are:

1) How can I control the amount of stocks coming from my pipeline?
2) How can I beat the benchmark, should I be using research?
3) How do I have orders in my portfolio like 8.7, 7.5? Am I getting partial orders?

8 responses

1) By controlling the amount of stocks, do you want only a set number after sorting? If so, you can sort by using the dataframe sort function and then using:

context.output=pipeline_output('my_pipeline').head(10)  

2) Unfortunately, this is a very extensive question. In short, to beat the benchmark you need to create a good algorithm with the goal of beating the benchmark.

3) I ran the algo and did not see any partial orders. Are you going to the transaction details and looking at the quantity field?

Yes, my idea for now, is to filter stocks to a certain number, after I have used set_screen. Is this the best way to control the number of positions? I am trying to use sort_values. I don't have much experience with pandas.

Thank you, I am looking at my count positions counter. I don't know why it is giving me partial numbers. Should I use a long-only algorithm, or should I include shorts? My goal would be to invest this in a small trading account, but my algorithm loses a lot of money when I move it to $10000 starting.

You are on the right track in controlling the number of positions. Maybe something like this for your 'before_trading_start' method.

def before_trading_start(context,data):  

    context.output=pipeline_output('my_pipeline')

    # It's typical to sort the returned dataframe by some column or columns to order the stocks "best to worst"  
    # Simply use the 'sort_values' method  
    # Something like this (arbitrarily chose 'eps_rank')  
    context.longs_df = context.output.sort_values('eps_rank', ascending = False)  
    # Next it's also typical to hold only the 'best' stocks and sell everything else  
    # This strategy makes it easy to determine how many to buy  
    # In a more complex strategy, perhaps where you want a minimum hold time, then  
    # the calculation for qty_to_buy would be more involved (eg target - qty_holding_min_time)  
    # Simply use the 'head' method  
    qty_to_buy = context.target_qty_of_stocks  
    context.longs_df = context.longs_df.head(qty_to_buy)

    # Creating a list of equity objects (rather than keeping the dataframe)  
    # isn't required but some do it as preference  
    context.longs = context.longs_df.index.tolist()  
    context.long_weight = weight(context)

Make sure you initialize 'context.target_qty_of _stocks' in the 'initialize' method.

Note that this will try to hold the target qty of stocks but if fewer than that pass your pipeline rules, then you may hold less. Occasionally you may also hold more if, for some reason, you aren't able to sell a position.

You asked about 'partial orders' in your portfolio. I believe you meant partial or fractional 'positions'. You really don't have them. You are probably seeing the values in the 'record' graph. Those display averages as the scale gets large. For example, if you have 20 positions one day and 19 the next, it may display 19.5. Zoom in on the graph and that phenomena will go away.

Ah I see, you are looking at the number of positions on your manually recorded field. For a longer time frame, the chart displays weekly data so the decimal values you are seeing on the number of longs is a result of the weekly average.

Whether you trade long only or mixed is up to you and your research. You may be losing with lower amounts of money due to the commissions and slippage model.

Sofyan Saputra- That makes sense. Back in the "old days" we put in the commission and slippage in the algorithm. Is this automatically being put into my model, or do I put this in.

Dan Whitnable- Thanks Dan, I am trying to work on improving with pandas and dictionaries and you example helped a lot. I like the idea of a minimum hold time to. I am going to try and implement that.

There are default commissions and slippage based primarily from IBs pricing model. You can also manually set them using the following:

    set_slippage(slippage.FixedSlippage(spread=0.03))  
    set_commission(commission.PerShare(cost=0.00075, min_trade_cost=.01))  

Dan- I am going through your code. I am confused by, the way you make it a list and not a dataframe. Is this creating a column of stocks, i.e, if SPY and AAPL were in a basket, I am indexing AAPL and SPY. Then there is no more data, which simplifies, for stock in context.longs, and makes it easier to iterate through? Should I look through pandas documentation to make it clearer? What would be the best way to learn this?

Not exactly sure what you mean by "it" when you say "I am confused by, the way you make it a list and not a dataframe" Yes, the only reason to use a list is it's maybe easier to iterate. It's really personal preference though. I personally don't do this. I just keep it in a dataframe. The only reason I did it this way was because this is what YOU had done in your original code:

    context.longs = context.output[context.output['fc']].index.tolist() 

I'll step through the code if that helps:

    # context.output is a dataframe. rows are indexed by security  
    context.output=pipeline_output('my_pipeline')

    # create another dataframe 'context.longs_df' which is just a sorted copy of 'context.output'  
    # this is to not disturb the original data. this way you can create other subsets of the data  
    # perhaps like 'context.shorts_df' or 'context.dont_buy_df'  
    context.longs_df = context.output.sort_values('eps_rank', ascending = False)  
    qty_to_buy = context.target_qty_of_stocks 

    # replace 'context.longs_df' with just the top securities  
    context.longs_df = context.longs_df.head(qty_to_buy) 

    # 'context.longs_df' is a dataframe and has all the data columns too  
    # if all one really needs is a list of the securities to go long, then why not discard the data?  
    # 'context.longs' will just be a list of the security objects and NO data  
    context.longs = context.longs_df.index.tolist()  

Note that context.long_df is really a temporary dataframe. 'context.output' has all the data and 'context.longs' is a list of the securities to go long. Maybe do something like this to make it more clear?

    context.output=pipeline_output('my_pipeline')

    qty_to_buy = context.target_qty_of_stocks  
    longs_df_temp = context.output.sort_values('eps_rank', ascending = False).head(qty_to_buy) 

    context.longs = longs_df_temp.index.tolist()  

Hope that helps.