Help with fetch_csv versus read

Back to Community

Help with fetch_csv versus read_csv

posted Nov 24, 2013

I am trying to access a file off of my drop box using the fetch csv function. Here it is below:

fetch_csv('https://www.dropbox.com/s/skmcidbnkjtwysx/finpythtrend.csv',
date_column = 'date',
date_format = '%Y-%m-%d',
symbol='x')

When I have this file on my local machine I am able to use read_csv with no issue. As soon as I move it onto dropbox, however, I get this error:

CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2

This error also shows up when I use fetch_csv. Anyone have any ideas?

6 responses

John Fawcett

Nov 25, 2013

Hi Christian,

The url you are using points to an html page that offers a download button, rather than the raw file.
If you use https://www.dropbox.com/s/skmcidbnkjtwysx/finpythtrend.csv?dl=1 the raw file is returned.

I took the liberty of looking at your file, and I noticed you have symbols as columns (observation format). To use it with fetcher, you need to convert it to record format, where 'symbol' is a column, and each row specifies the symbol. If you have N symbols you will have N rows for each date in your dataset. You can either reformat the file before using it with fetcher, or you can use a pre_func to do the transformation when you pull the data into your algorithm.

Here is a dummy data sample that is also in observation style:

date,GPS,HON,NFLX  
2002-02-15,128.00,128.18,127.41  
2002-02-18,129.39,129.65,128.95  
2002-02-19,128.73,129.37,128.52  
2002-02-20,129.57,129.70,128.54  
2002-02-21,128.64,129.05,127.72

and here is an algorithm that uses a pre_func to reformat the data into record format:

import numpy as np  
from pandas import DataFrame


def unpivot(frame):  
    """  
    Function to convert observation data into record data format.  
    Copied from the unpivot function from:  
    http://pandas.pydata.org/pandas-docs/dev/reshaping.html  
    """

    frame = frame.set_index('date')  
    N, K = frame.shape  
    data = {'value' : frame.values.ravel('F'),  
            'variable' : np.asarray(frame.columns).repeat(N),  
            'date' : np.tile(np.asarray(frame.index), K)}  
    return DataFrame(data, columns=['date', 'symbol', 'value'])

def initialize(context):  
    fetch_csv('http://yourserver.com/sample.csv', pre_func=unpivot)

def handle_data(context, data):  
    print data[sid(25090)]['value']

The observation format is very common, and we're considering adding it to the next iteration of fetch_csv. There's a thread and a specification shared that you can review if you'd like to weigh in on the direction - https://www.quantopian.com/posts/proposed-changes-to-fetcher-and-universe-selection

Thanks for your question, and I hope you have smooth sailing from here.

thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Christian D P

Nov 26, 2013

Hi John:

Thanks for getting back to me. I was under the impression I would be able to have the headings be something I could reference like a pandas dataframe column but I guess this is a bit different.

All that said I am currently doing the pre-processing suggested to get a record formatted file. When I take that file and open on my machine using:

df = pd.read_csv('https://www.dropbox.com/s/zosd608ql2il9bd/finpytrend.csv?dl=1')

It opens just fine and I can clearly see the date column. When I use:

fetch_csv('https://www.dropbox.com/s/zosd608ql2il9bd/finpytrend.csv?dl=1')

I get an error that says "KeyError: u'no item named date'" which doesn't quite jive with what I am seeing in read_csv on my computer nor does the csv file I have seem to have an issue. Here are the first few rows:

date,symbol,value
2004-01-10,XLF,0
2004-01-17,XLF,0
2004-01-24,XLF,0
2004-01-31,XLF,0
2004-02-07,XLF,0

Any suggestions on what I am missing?

Dan Dunn

Nov 26, 2013

Christian, that page you're hitting is actually executing some code in the browser and then feeding you a file. Look at it without the parameter and you'll see it's a pretty HTML page, and that's something Fetcher can't parse. https://www.dropbox.com/s/zosd608ql2il9bd/finpytrend.csv (I think Fawce steered you a little wrong!)

What you need is a URL that looks more like this: https://dl.dropboxusercontent.com/u/155695/Top%20Secret.txt

You can get a public link for any file in your Dropbox's Public folder.


Simply right click (or control click) on a file, click the Dropbox submenu,  
and then click 'Copy public link.'

Christian D P

Dec 17, 2013

Having a bit of a hard time with this. I have a csv file in record format that I am using fetch_csv to bring into an algorithm. Once monthly there is a signal for each security in the file. When I run an algorithm trying to match based on the security and the date it ends up only executing on one of the securities (although I can't determine how it is selecting it). My code is attached, does anyone in the community have a suggestion on what is going on here?

Dan Dunn

Dec 17, 2013

I don't have time to get this running all the way, but I can clear up a few errors for you to get you to the next step.

First, every stock you want to work with in Fetcher needs to also be named in the the algorithm (generally in initialize(), as you do). So, change your initialization to include the stocks in your CSV.

    context.stocks = [sid(24), sid(24819)]

Second, change your column name to be 'symbol'

Third, change your symbol column to be literally the symbol, i.e. AAPL not sid(24).

Second and third demonstrated:

date,symbol,indicator  
2012-01-01,AAPL,0.132087103  
2012-01-01,EBAY,0.085310641

I'm a little worried that the if/in constructs you have there won't get what we expect, but lets start with these changes.

FYI, I updated the Fetcher help section since you first asked this question. I've been trying to make it easier to understand and implement on the first try.

Disclaimer

Dan Dunn

Dec 17, 2013

Whoops, I forgot change number four.

Remove line 18, ' symbol = 'black''. That should only be used if you're trying to bring in non-security information, a signal. Since you are trying to bring in signal information, the symbol should not be specified in the fetch_csv() and should be in the CSV.

Disclaimer

You've successfully submitted a support ticket.

Our support team will be in touch soon.