Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Problem about fetch_csv

Hi guys,

Have a problem about fetching data online. I have an online CSV and I try to use it in my strategy. Here is the code:

def initialize(context):  
    url = 'https://copy.com/oJWlFssZuibJcObZ'  
    fetch_csv(url,  
              date_column = 'tradingDay',  
              date_format = '%m/%d/%y',  
              symbol ='CLZ15')  
def handle_data(context, data):  
    # Implement your algorithm logic here.  
        print (data['CLZ15']['close'])  

I try to get the closing price from the csv, which has a symbol column named 'CLZ15'. Whenever I run this code, it says:" Runtime exception: KeyError: ''close", could someone tell me what is wrong with this? It is driving me crazy.

Thank you

9 responses

maybe you are backtesting prior to the start of this CSV? could try wrapping it in a if 'CLZ15' in data and 'close' in data['CLZ15']

At the very beginning, I am trying to load some data first, then backtest. I've searched the community, find one way to load the data and it is quoted here: https://www.quantopian.com/posts/quandl-csv-import-simple-question, it seems that if I write in this way, it will works.

def initialize(context):  
    url = 'https://copy.com/oJWlFssZuibJcObZ'  
    fetch_csv(url,  
              date_column = 'tradingDay',  
              date_format = '%m/%d/%y',  
              symbol ='CLZ15')  
     context.stock = symbol('SPY')  
def handle_data(context, data):  
    # Implement your algorithm logic here.  
        print (data['CLZ15']['close'])  

According to the link, the problem is that there are no referenced securities in this algorithms(which means that no tick symbol for 'CLZ15' in quantopian database), so there is no universe for data to pass in. Could somebody explain this a little bit, I am so confused about this.

Thank you very much

Hello,

The "universe" referred to in our documentation and in the community post that you linked to is a dictionary in the Quantopian API that stores all equities that you have introduced in your program. Whether the equities are imported using symbol(), sid(), fundamentals data, or fetcher, they are all stored in your universe which is represented by the data variable in handle_data. If your universe is empty at any given bar, the handle_data method is skipped due to the fact that you are not storing references to any stocks!

In your case, since CLZ15 is not in the Quantopian database, fetch_csv doesn't actually add anything to the universe. When this is the only statement in your initialize method, then handle_data is never called. However, when you add context.stock = symbol('SPY'), you are adding the SPY equity to your universe and handle_data is called. Now when you try to print data['CLZ15']['close'], there is no such entry in data, which is where you get your KeyError.

Does that help clarify things?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Jamie.

I think I get your idea. But even if I add 'context.stock = symbol('SPY'), in my initialize method, I sill could not get the data. Here is the code:

def initialize(context):  
    url = 'https://copy.com/oJWlFssZuibJcObZ'  
    fetch_csv(url,  
              date_column = 'tradingDay',  
              date_format = '%m/%d/%y',  
              symbol ='CLZ15')  
     context.stock = symbol('SPY')  

def handle_data(context, data):  
    if 'Closing_Data' in data['CLZ15']:  
       # record(close = data['CLZ15']['close'])  
        print ('data detected')  
    else:  
        print('No data detected')  

All the logs you could see is like this: "PRINT No data detected", which means that the fetch_csv doesn't work. I've searched the forum and found one has a similar question like me. Here is the link: https://www.quantopian.com/posts/csv-fetch-not-working. In this post, Alisa said that it needs to be a "pure" CSV , so I created the file on http://www.copy.com. But sill it doesn't work. Now the weirdest thing happens:

import pandas as pd  
from pytz import timezone  
urla ='http://www.quandl.com/api/v1/datasets/ISE/EQU_SI.csv?trim_start=2011-01-01'  

def initialize(context):  
    fetch_csv(urla, symbol='advn', date_column="Date")  
    context.sec = symbol('SPY')

def handle_data(context, data):  
    advn = data['advn']  
    # There was a KeyError on the first bar if this check isn't here.  
    if 'Close' in advn:  
        print ('data detected')  
    else:  
        print ('no data detected')  

This is the code where I get from the link I mentioned and modify a little bit (no rename_col function), when I run this code, it could get the data and print it on log. But when I change the url, for example I would like to download 'INDEX_GSPC' data from quandl, it doesn't work again.

import pandas as pd  
urla ='https://www.quandl.com/api/v3/datasets/YAHOO/INDEX_GSPC.csv?start_date=2014-01-01&end_date=2015-08-27'


def initialize(context):  
    fetch_csv(urla, symbol='advn', date_column="Date")  
    context.sec = symbol('SPY')

def handle_data(context, data):  
    advn = data['advn']  
    if 'Close' in advn:  
        print ('data detected')  
        print (advn['Close'])  
    else:  
        print ('no data detected')  

I simply change the url, it now says "no data detected". I am so confused since I don't know what is going on. The same code syntax, different URL for different data on Quandl, different results. This thing are driving me crazy. Could anybody explain this for me?

Thank you very much.

I think that what was happening was that "close" is already used as a name.

I did a column rename in the post_func, and the data loads properly now. See attached.

Just to be clear, you still can't buy and sell CLZ15 yet. We only support buying and selling securities that are in our universe. The good news: Futures Are Coming. You'll be able to buy and sell, and get pricing natively within Quantopian.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I am really appreciated your help, Dan. I've figure it out. Now I have load the data. Since the fetch_csv returns a dataframe, so now I am trying to use some built in function for this dataframe. For example in this case, I am trying to set index to a column named:"timestamp",

def handle_data(context, data):  
    data['CLZ15'].set_index('timestamp')  

it says: "Runtime exception: AttributeError: 'SIDData' object has no attribute 'set_index'". So I am wondering if I could use any built in function for dataframe after I use fetch_csv to load the data. Or do I have to index the data in the post_func part in fetch_csv?

Thank you very much.

data is not a pandas structure, it's just a dict of dicts (or something like that). When you fetch_csv, you do not get any history of the data, just each row singly in handle_data.

To correctly set the timestamp, you can set a pre_func which can rename columns and set index and stuff like that. There's also a post_func; I forget what the difference is.

Hi Simon, this is what I found online: https://gist.github.com/fawce/7154053. "The csv is parsed into a pandas dataframe using pandas.io.parsers.read_csv ". That is the reason why I thought the return data is a pandas dataframe.

Then it says:

During simulation, the rows of the csv/dataframe are streamed to your algorithm's handle_data method as additional properties of the data parameter.

In any case, if you want to renamed columns etc, it has to be in one of the pre_func or post_func functions passed to fetch_csv. During simulation, you do not have access to anything but the current date's value.