Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Comparing Returns and pct_change

As we know, Returns can be passed as input in customfactor. I am curious too see if it gives the same result as pct_change. So I create a customfactor like this:

    class test(CustomFactor):  
        inputs = [USEquityPricing.close,  
                  Returns(inputs=[USEquityPricing.close], window_length= 3),  
                 ]  
        window_length=10  
        def compute(self, today, assets, out, close, ret):  
            prices = pd.DataFrame(close, columns=assets)  
            prices_ret = prices.pct_change(2)  
            out[:] = prices_ret.iloc[-1,:]  

Here is the printout:

prices_ret
DataFrame:
24 700 1335 5061 20088
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 -0.022979 -0.001833 -0.014212 -0.020139 -0.019471
3 -0.058495 -0.041379 -0.024194 -0.052760 -0.010981
4 -0.024083 -0.022644 -0.017038 -0.048193 0.004000
5 0.038641 0.007848 0.038567 -0.028772 0.003824
6 0.011843 -0.028178 0.010667 -0.029412 0.014229
7 -0.010699 0.002596 0.026525 -0.050133 0.026590
8 0.033809 0.043170 0.060686 -0.004603 0.019943
9 0.052763 0.018123 0.023256 0.031587 0.033047

pd.DataFrame(ret, columns=assets)
DataFrame:
24 700 1335 5061 20088
0 -0.015948 -0.029063 -0.052632 -0.010602 -0.012518
1 -0.006575 -0.025061 -0.067669 -0.006593 -0.041141
2 -0.022979 -0.001833 -0.014212 -0.020139 -0.019471
3 -0.058495 -0.041379 -0.024194 -0.052760 -0.010981
4 -0.024083 -0.022644 -0.017038 -0.048193 0.004000
5 0.038641 0.007848 0.038567 -0.028772 0.003824
6 0.011843 -0.028178 0.010667 -0.029412 0.014229
7 -0.010699 0.002596 0.026525 -0.050133 0.026590
8 0.033809 0.043170 0.060686 -0.004603 0.019943
9 0.052763 0.018123 0.023256 0.031587 0.033047

Notice that the first two rows are different. This might have an impact if I do column-wise ops (sum, mean etc).

So where are the numbers of the first two rows in Returns coming from? Junk? what is the best way to compute and use a percent change ?

4 responses

Good observation. One often overlooked feature of pipeline is, whenever it can, pipeline will fetch the required data needed to perform calculations. In the case of the built in Returns method, pipeline knows it needs to fetch extra data to calculate the returns for the very first day.

Let's see how this impacts the above calculations. When using the built in Returns method with a window length of three, pipeline is smart enough to fetch an extra three days of data to be able to calculate the first few returns values. The values are not 'junk' they are the correct returns for those first few days. However, when one writes a custom factor, pipeline doesn't know it needs those extra few days. The pandas pct_change method returns nans for the first few values since it doesn't have enough previous data to calculate them.

To get the full windows worth of return values it is best to use the built in Returns method. Calculating the values in a custom factor produces the same result but may require increasing the window length if those first few values are important.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Ok, I see. Thanks for the explanation!

What happens when historic data does not have that extra for Returns? Will I get something similat to pct_change then?

If historic pricing data is not available, nan values are returned to pipeline. Any factors that rely on such data would then also return nans. So yes, the result would be similar to what pct_change would return. Note that historic data would only not be available in cases where a stock first tarted trading (ie no data before that time) or if a stock is so thinly traded that no trades occurred (which happens but not very often).