Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Possible to do a percentage rank (by time, not by stock) as a custom factor?

Hi All,

I'm trying to learn how to create custom factors, so one of the things I thought I would do would be to code a DV2 custom factor. However that is proving more difficult than I thought. The problem is that I need to create a rolling window ranking of the values over a certain lookback period.

For example, assume I'm just trying to write a custom factor that tells me where the close price is for stock XXX today, as compared to a 500 day lookback window. I've coded stuff like this in python in the past, using the method rolling from pandas, but when working with custom factors the data is in numpy array format. Without converting to pandas dataframe, is there a way to get a ranking over a certain lookback period in terms of colums (times), not rows (different stocks).

I've tried looking at stack overflow for non Quantopian examples, and have tried using rank = close.argsort(axis=0).argsort(axis=0) + 1, but this doesn't seem to be working.

Apologies if I haven't explained this properly, I'm new to the community, but any feedback would be appreciated. If there's a way I can better explain this, or my post etiquette is flawed, please just advise the changes I should be making.

Cheers.

4 responses

Not sure why your method doesn't seem to be working. (Maybe you forgot to index the last row with [-1] ? ) This numpy approach seems to work for me:

class RankOfLastPrice_np(CustomFactor):  
    inputs = [USEquityPricing.close]  
    window_length = 2  
    def compute(self, today, assets, out, prices):  
        out[:] = prices.argsort(axis=0).argsort(axis=0)[-1] +1

However, I'd go with changing the numpy array into a pandas dataframe and using the pandas 'rank' method. It has a lot more features and checks. Look at the docs at for the options http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rank.html

class RankOfLastPrice_pd(CustomFactor):  
    inputs = [USEquityPricing.close]  
    window_length = 2  
    def compute(self, today, assets, out, prices):  
        prices_df = pd.DataFrame(prices)  
        out[:] = prices_df.rank(axis=0).tail(1)

I've attached a notebook showing both factors in action.

Thanks so much for your help Dan, that was really detailed and gave me a heap of information.

The problem seems to be coming in when I do the DV2 calculation itself, and I'm not sure why. Essentially if I take your notebook and modify it slightly, to rank based on dv2 instead of close, everything goes crazy and I continually get a rank of 2 for everything, when using a 2 period window.

Essentially the PRDV (percentage rank DV2) is calculated by performing an arithmetic calculation on the close, high and low to give dv. When I use dv in the sorting it sorts fine. Then to get dv2, you simply take a 2 day moving average of dv. I've done this by using np.roll to get another numpy array, then just adding that to the orginal dv and dividing by 2. There is probably a better way to do this. But as soon as I sort by dv2 now, I continually get an answer of 2.0 for every single day, which isn't correct. Somehow the problem has come in while I am going from dv to the 2 day MA of dv2, but I can't for the life of me figure out where.

Thanks again for all the advice, it is really appreciated. You are right, the pandas way is better and I'm better with pandas than np anyway, so that is the way to go. But after hours of working on the numpy method, I would like to be able to understand where the problem is coming in.

Cheers.

Ok, I think I figured it out. The problem was arising because my indexing was atrocious. I believe I've got the problem fixed now, and have attached the solution in my notebook. Just a warning it is pretty messy, but if the same problem applies to you, you'll be able to figure it out from my indexing. Thanks again to Dan for providing the notebook and guidance that helped me realize that the issue wasn't with argsort command, and eventually allowed me to pinpoint my problem.

Cheers again Dan.

Wei, glad I could help.

Good luck!