Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
how to properly import cvs. file with my own time-series dataset?

Hello there,

I am very new to both quantopian and python but would like to use its functionality for my own time-series analysis.
Could anybody help me out with the first step to bring in the data by importing cvs?

It includes 3 columns: date (%m.%d.%y %h:%m), symbol, and value. I wanted to draw the graph with the values by writing this:

def initialize(context):  
    fetch_csv('https://dl.dropboxusercontent.com/u/12299882/user5.csv',  
               date_column = 'date',  
               date_format = '%m.%d.%y %h:%m',  
               symbol = 'user5',  
               usecols = ['value'])  

# Will be called on every trade event for the securities you specify.  
def handle_data(context, data):  
    # Implement your algorithm logic here.  
    my_data = data['user5']['value']  
    record(my_data)  

and i'm getting an error ...
Will appreciate your help guys.

Sergei

10 responses

Hey Sergei,

You were 90% of the way there. Only a few things were off:

  • The date_format you used didn't match the ones in the CSV
  • It's a good idea to check that 'value' exists in data['user5'] before trying to record it because it could be that data just doesn't exist (e.g. the times don't match up)

I've attached some code that should fix your problems so let me know if you have any questions.

def initialize(context):  
    context.sid = sid(24)  
    fetch_csv('https://dl.dropboxusercontent.com/u/12299882/user5.csv',  
               date_column = 'date',  
               #: Different date formats apply  
               #: No leading zero in front of month: %-m  
               #: Full year syntax (2014 not 14): %Y  
               #: Hour and minute = %H:%M  
               date_format = '%-m.%d.%Y %H:%M',  
               symbol = 'user5',  
               )  

# Will be called on every trade event for the securities you specify.  
def handle_data(context, data):  
    # Implement your algorithm logic here.  
    if 'value' in data['user5']:  
        my_data = data['user5']['value']  
        record(my_data=my_data)  

Seong

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you Seong!

As you already noticed, I've got minute data to work with, but I can see that quantopian constructs daily graphs even if I choose a minute back-test..
Any idea how to force it to show data minute by minute?

Sergei

Hey Sergei,

Great observation! That's something we're working on : )

Seong

well, that's sad (
I'm thinking now of transforming my minute data into daily sets... that won't hurt my analysis, but will significantly extend the backtest period into something like 50 years.
Do I understand correctly that your engine is bound to a period of 2002 till present ? then the way around would be to chop the data into peaces I believe?

Sergei

That's right, our backtesting data is from January 2002 until today.

We are working on a new research environment that will be an IPython notebook for your ad-hoc analysis and plotting. Here you'll be able to graph according to your custom parameters and analyze your backtest results.

For more information, take a look at our sneak-peek announcement and then sign up to request access for the beta!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Alisa, thanks for your input... well noted.

I'm being bumped into loads of other related questions. For example, manipulating my time-series data by transforming it with built-in methods like mavg(days), and the ones from TA-lib. Is this possible?

My ultimate idea is to play with my data as if it were a stock sid from your database. But I see there are complications to treat the imported data the very same way as your own.

Absolutely! You can use any of the transformations on your fetched data including the simple transformations or mavg, stdev, vwap, and the ta-lib methods. The one caveat is if you use history() on the stocks, it will pull the history from our database, not from your file. This is a feature that we're still working to improve.

You can use your fetched data to create your universe of stocks or use them as a trading signal. For more info, take a look at: https://www.quantopian.com/help#overview-fetcher

Alisa,

Could you advise what I'm doing wrong when applying this fairly easy transformation here. It throws out an error for mavg() line.

def initialize(context):  
    context.sid = sid(24)  
    fetch_csv('https://dl.dropboxusercontent.com/u/12299882/user51.csv',  
               date_column = 'date',  
               #: Different date formats apply  
               #: No leading zero in front of month: %-m  
               #: Full year syntax (2014 not 14): %Y  
               #: Hour and minute = %H:%M  
               date_format = '%-m.%d.%Y %H:%M',  
               symbol = 'user5',  
               )  
# Will be called on every trade event for the securities you specify.  
def handle_data(context, data):  
    # Implement your algorithm logic here.  
    if 'value' in data['user5']:  
        # Show my data  
        my_data = data['user5']['value']  
        record(my_data=my_data)  
        # Show my data 10 days MA value  
        my_avg = data['user5']['value'].mavg(10)  
        record(my_avg=my_avg)  

Hi Sergei,

What's happening here is that when you call data['user5']['value'] it only retrieves the current day's 'value' since 'fetch_csv' transforms your data into a TimeSeries. So when you try doing mavg(10) on data['user5']['value'], you're trying to find the 10 day moving average for 1 value.

There are a couple ways around this.

One way is to append your data['user5']['value'] to a Python list and take the moving average of that. So something like

#: Append data to a Python list  
context.past_values.append(my_data)

#: Only when the list is greater than 10 find the numpy.average  
if len(context.past_values) > 10:  
    my_avg = np.average(context.past_values[10:])  
    record(my_avg = my_avg)  

Or what you could do is something like what's found here: https://www.quantopian.com/posts/method-to-get-historic-values-from-fetcher-data

def pre_func(df):  
    value = df['value']  
    # dates = df['date']  
    df['mean'] = pd.rolling_mean(value, 10)  
    return df

def initialize(context):  
    context.sid = sid(24)  
    fetch_csv('https://dl.dropboxusercontent.com/u/12299882/user51.csv',  
               date_column = 'date',  
               #: Different date formats apply  
               #: No leading zero in front of month: %-m  
               #: Full year syntax (2014 not 14): %Y  
               #: Hour and minute = %H:%M  
               date_format = '%-m.%d.%Y %H:%M',  
               symbol = 'user5',  
               pre_func=pre_func  
               )  

# Will be called on every trade event for the securities you specify.  
def handle_data(context, data):  
    # Implement your algorithm logic here.  
    if 'value' in data['user5']:  
        # Show my data  
        my_data = data['user5']['value']  
        record(my_data=my_data)  
        # Show my data 10 days MA value  
        my_avg = data['user5']['mean']  
        record(my_avg=my_avg)  

Try both out and let me know what you think!

Seong

Seong, that was very helpful! thanks!