Understanding Fetcher; Runtime exception: Key Error

Back to Community

edited Jan 17, 2015

Hi,

I'm having trouble using the fetcher to import signals. I seem to getting this error every time i run this. Can someone please explain for this python newby?

def initialize(context):  
    fetch_csv('https://www.dropbox.com/s/01dn5cmltljav7t/VixData.csv?dl=0', symbol='VIX', date_column = 'Date',  
               date_format = '%m/%d/%y')

def handle_data(context, data):  
    current_VIX = data['VIX']['Value']

    # plot it  
    record(VIX=current_VIX)

Thank you!

14 responses

Param Sidhu

Jan 17, 2015

I am having the exact same issue with Fetcher code that was working yesterday....so I wonder if its a problem with the backtester right now?

sameer iqbal

Jan 17, 2015

Maybe? I made this yesterday and was never able to get it to work successfully...

Seong Lee

Jan 18, 2015

Hi Sameer,

I think the problems that you and Param are experiencing may be unrelated. Param, feel free to post here with your problems and I can help you with that as well.

Sameer, what was happening in your algorithm was three fold:

1) The dropbox link you specified, unfortunately, didn't work with the backtester. Lately, Copy.com has been working well so I moved your CSV onto there.
2) In this case, you don't want to be calling data['VIX'] because your CSV already contains a ticker symbol, NFLX. The backtester will automatically map that ticker into DATA so all you'd have to do here is call data[symbol('NFLX')]['Value']
3) Because Fetcher maps the values in your CSVs to data based on DATE, what you actually have to do is check whether or not DATA contains 'Value'. And you can do that by a quick 'if "Value" in data[symbol("NFLX")]"

Please see the attached code:

def initialize(context):  
    #: Remove the symbol parameters and use copy.com  
    fetch_csv('https://copy.com/vgXLQLcQsLvUAj2q',  
              date_column = 'Date',  
              date_format = '%m/%d/%y')  
    #: Get NFLX  
    context.stock = symbol('NFLX')

def handle_data(context, data):  
    #: Check if we have data  
    if 'Value' in data[context.stock]:  
        #: Use NFLX to query for the data  
        current_VIX = data[context.stock]['Value']

        # plot it  
        record(VIX=current_VIX)

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

sameer iqbal

Jan 18, 2015

Aha! thank you for the explanation!

Saravanan Shanmugham

Jan 18, 2015

Can some one help with this fetcher problem i am having?
I am doing 2 different fetches very similar in nature, Except for the name of the columns and format of the dates in the second table
Yet, the first one works, while the second doesn't.
Any ideas?


def fix_datecol(df):  
    # reformat date column into string format  
    df['TRADE_DATE'] = df['TRADE_DATE'].apply(lambda x: str(x))  
    log.info('\nfix_datecol %s ' % df.head())  
    return df

def fix_datecol1(df):  
    # reformat date column into string format  
    log.info('\nfix_datecol1 %s ' % df.head())  
    return df

def rename_col(df):  
    #rename cols to what handle_data expects  
    df = df.rename(columns={'United States': 'price'})  
#    df = df.rename(columns={'CPI':'price'})  
    df = df.fillna(method='ffill')  
    df = df[['price', 'sid']]  
    log.info('\nrename_col %s ' % df.head())  
    return df

def rename_col1(df):  
    #rename cols to what handle_data expects  
    df = df.rename(columns={'Value': 'price'})  
    df = df.fillna(method='ffill')  
    df = df[['price', 'sid']]  
    log.info('\nrename_col1 %s ' % df.head())  
    return df


# Put any initialization logic here.  The context object will be passed to  
# the other methods in your algorithm.  
def initialize(context):  
    fetch_csv(     'https://dl.dropboxusercontent.com/u/169032081/US_monthly_series_inflation_idx.csv',  
               symbol='Inflation_Index',  
               date_column = 'TRADE_DATE',  
               date_format = '%Y%m%d',  
               pre_func=fix_datecol,  
               post_func=rename_col)  

    fetch_csv('https://www.quandl.com/api/v1/datasets/FRED/CPIAUCSL.csv',  
        symbol='Consumer_Price_Index',  
        date_column='Date',  
        date_format='%Y-%m-%d',  
        pre_func=fix_datecol1,  
        post_func=rename_col1)  
    context.stock = sid(8554)  
#    pass

# Will be called on every trade event for the securities you specify.  
def handle_data(context, data):  
    # Implement your algorithm logic here.

    # data[sid(X)] holds the trade event data for that security.  
    # context.portfolio holds the current portfolio state.

    # Place orders with the order(SID, amount) method.

    # TODO: implement your own logic here.  
#    print "Inflation_Index=%s, USCPI=%s"%(data['Inflation_Index']['price'],data['USCPI']['price'])  
    log.info('\n %s ' % data['Inflation_Index'])  
    log.info('\n %s ' % data['Consumer_Price_Index'])  
    log.info('\n %s ' % data['Inflation_Index']['price'])  
    log.info('\n %s ' % data['Consumer_Price_Index']['price'])

    record(Inflation_Index=data['Inflation_Index']['price'],  
           Infaltion_Index2=data['Consumer_Price_Index']['price'])  
    order(context.stock,10)

any idea's on this?

The code below works for me, sometimes there are errors at the beginning of the test because dates do not line up correctly. If the data is monthly, there might be a gap at the beginning so you have to do some checking for this sort of thing. Hope this helps!

David

Saravanan Shanmugham

Jan 24, 2015

couple of things.
1. the actual download has data from 1947, yet when copied the code over the code the data in the graph starts from 2008.
2. The code I had should have worked from what examples from quantopian i have seen. yet it doesn't. Not sure why
3. I am not sure how to debug this, or how you got it working, even if partially. I presume df.Date.apply() code applies the pd.Timestamp() to read the date format(%Y-%M-%D) in the data more reliably. Whats the purpose of the sort and index operation though? And more importantly how do you debug fetch_csv and the pandas data from that you get

Sarvi

Saravanan Shanmugham

Jan 25, 2015

Is this a bug in fetch_csv(). In which case is there a consistent workaround?

I am still not able to get the CPI data from quandl consistently.
The data when manually pulled seems very simple and and I wouldn't have expected anything more than renaming columns and establishing the data format as %Y/%M/%D to be needed.

Yet it doesn't work.

I have raised this question in the community as well as pinged quantopian support through the chat link.
No one seems to have a clear answer

Sarvi

David Edwards

Jan 25, 2015

Hey Sarvi, sorry you're still having issues. I left out the date format and used df.Date.apply(lambda dt: pd.Timestamp(dt, tz='utc'))) because pandas date parsing works really well. I also wanted actual timestamps with a 'utc' time zone. The sort and indexing was just to make the dataframe format more compatible with other pandas time series, e.g. ascending order by timestamp.

I looked at the files you are downloading and the CPI does go back prior to Quantopian's earliest data, but the inflation index series starts in 2008, that would be why it wasn't working until then. The code I shared is working for me, you have to make sure to check if the values exist, sometimes an error is thrown at the beginning of the test if you don't do this.

Try the code below, using try/except with fetcher might alleviate some stress.

Saravanan Shanmugham

Jan 26, 2015

Thanks David,
I guess your earlier version worked as well, I kept wondering how come CPI only started at 2008.
because it was not being shown on the plot.

Just now realized that the code in handle_data was not recoding it if inflation was not available.
And since it wasn't shown i presumed there was something wrong with the data being pulled itself.
Should looked harder.

This should get me going.

Question: I think I understand what you are trying to do here?
But what about the CPI data that requires these Date/apply, and utc timestamp and sorting and stuff.
The documentation suggests something simpler, and that simple stuff worked for Inflation it looks like.
But didn't work for CPI.

Just trying to what to look for or how to trouble shoot this sort of problem in the future.

On another note, is there list or collection or thread where people are collecting such data sources and code segments that are known to work with those data sources.
That would be very useful to people in the future.
If there isn't I will start one to document this for future users.

Thanks,
Sarvi

Saravanan Shanmugham

Jan 26, 2015

I added a print statement to the except code and I noticed that it is getting exercised for CPI data which is available for decades earlier than the backtest begin.
I also notice the data missing in the plot as well, for a month at backtest start.

Yet, I am seeing the exception for for a whole month at the beginning of the test.

Why is that? Does the fetch_csv() command I am using need further tuning?

Sarvi

David Edwards

Jan 27, 2015

The NaN values at the beginning are because the CPI timeseries is monthly and it gets chopped off at the backtest start date, so there are some values that get lost. The only thing I can think of would be to turn it into a daily frequency timeseries and forward fill the values. You have to be carful reindexing like this, but I took a stab at it and believe this works, there's probably a cleaner way to get it done though.

Saravanan Shanmugham

Jan 27, 2015

Thanks for looking into this and it works great.

Correct me if I am wrong, I am thinking the following
1. The basic documented API should have worked for this simple dataset including data. And I am beginning to conclude this is a quantopian bug that needs fixing. The API shown in the quantopian documention seems more than sufficient to handle this data and my original code should have worked.
2. The quantopian documentation suggests df = df.fillna(method='ffill') to deal with turning monthly/yearly data to daily/minute barfs, thats if I understand this API correctly, which means that the data from the month before the backtest begin/quantopian data begin should have forward filled correctly. I did try the fillna() API a couple of different places in reformat_quandl() and non of them worked. That leads me to believe that this is another bug that needs fixing

Did I understand the API intent correctly or am I wrong in assuming these are bugs?
This has certainly fixed the problem for me, but understanding this better will help figure things out by myself in the future.

Thanks,
Sarvi

You've successfully submitted a support ticket.

Our support team will be in touch soon.