Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Alphalens Error

Hi Ernesto,

I have encountered an error in Alphalens - see attached Notebook - as reported:

ValueError: Wrong number of items passed 370919, placement implies 37114  

I used the Factor analysis template in Lecture 38 - not sure what's amiss.

Thanks for your help,

Karl

4 responses

There is an error when sorting the values:

myFactor = fcfQuant['fcf'].sort_values(ascending=False)  

I believe what you really want to do is to sort the values within each day:

myFactor = fcfQuant['fcf'].groupby(level=0, group_keys=False).apply(lambda x: x.sort_values(ascending=False))

And by the way I am not sure why you are doing so. You can pass to Alphalens the unsorted factor

myFactor = fcfQuant['fcf']

Also there is a mistake in Lecture 38, when fetching the prices you should ask for fields='open_price' while the Lecture suggests to ask for fields='price'

prices = get_pricing(stockList, start_date='2015-01-01', end_date='2016-02-01', fields='open_price')

Oh, also you want to compute the sector data like this:

sectors = fcfQuant['Sector']

In this way sectors and myFactor have the same index

Thanks, Luca for your help - working fine now.

Regret the .sort_values() was meant to be left out - my oversight.
For interest, why do you think there is a mistake in specifying fields='price' in get_pricing() ?

Cheers

I believe fields='price'gives you the close price for that day and that's not what you probably want.

The pricing data passed to alphalens should contain the entry price for the assets and it must reflect the next available price after a factor value was observed at a given timestamp. The pipeline values are computed before market open every day and I imagine you like to enter the new positions at market open. In this scenario the open price represent the next available price. Then , if you intent to trade the factor just a little bit before market close, you can pass the close price.

From Alphalens docstrings:

    prices : pd.DataFrame  
        A wide form Pandas DataFrame indexed by timestamp with assets  
        in the columns. It is important to pass the  
        correct pricing data in depending on what time of period your  
        signal was generated so to avoid lookahead bias, or  
        delayed calculations. Pricing data must span the factor  
        analysis time period plus an additional buffer window  
        that is greater than the maximum number of expected periods  
        in the forward returns calculations.  
        'Prices' must contain at least an entry for each timestamp/asset  
        combination in 'factor'. This entry must be the asset price  
        at the time the asset factor value is computed and it will be  
        considered the buy price for that asset at that timestamp.  
        'Prices' must also contain entries for timestamps following each  
        timestamp/asset combination in 'factor', as many more timestamps  
        as the maximum value in 'periods'. The asset price after 'period'  
        timestamps will be considered the sell price for that asset when  
        computing 'period' forward returns.  
        ::  
            ----------------------------------------------------  
                        | AAPL |  BA  |  CMG  |  DAL  |  LULU  |  
            ----------------------------------------------------  
               Date     |      |      |       |       |        |  
            ----------------------------------------------------  
            2014-01-01  |605.12| 24.58|  11.72| 54.43 |  37.14 |  
            ----------------------------------------------------  
            2014-01-02  |604.35| 22.23|  12.21| 52.78 |  33.63 |  
            ----------------------------------------------------  
            2014-01-03  |607.94| 21.68|  14.36| 53.94 |  29.37 |  
            ----------------------------------------------------