Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Problem accessing my created pandas DataFrame

Hi,

Thank you for your excellent work.

I want to take advantage of the new Fetcher and upload my own signals. In testing the framework for that, I create a simple random DataFrame and try to access it in handle_data(). But in trying to access a member of the DataFrame in handle_data(), I get "Error Runtime exception: KeyError:".

Here is the relevant code (sorry, I did not figure out how to link directly to my algo). I have a few ugly diagnostics to try to figure out the problem. 3 lines are indicated as causing an error (one of which is a bonus for this post: a simple assert statement does not work).


# Cloned from "Dual Moving Averages version 0.2"  
# https://www.quantopian.com/posts/dual-moving-averages-version-0-dot-2

import pandas as pd  
import numpy as npy

START = '2012-01-01'  
END = '2012-12-31'  
index = pd.date_range(START, END, freq='B')

THRESH = 0.1

def initialize(context):  
    print pd.version.version

    # define portfolio  
    context.sids = [sid(24), sid(3766), sid(5061), sid(22802),  
                    sid(26578)]  
    context.N = len(context.sids)  
    # init dict to track whether we invested in a particular sid  
    context.invested = {}  
    context.sig = {}  
    for cur_sid in context.sids:  
        context.invested[cur_sid] = False  
    context.sig = pd.DataFrame(  
        npy.random.rand(len(index), len(context.sids)),  
        index=index, columns=context.sids)

    # This confirms that my pandas syntax is correct.  
    print context.sig[context.sids[0]][index[0]]

def handle_data(context, data):  
    sig = context.sig  
    for cur_sid in context.sids:  
        # skip non-existant sids  
        if cur_sid not in data:  
            continue  
        t = pd.Timestamp(data[context.sids[0]].datetime.date())  
        log.critical('*, %s' % str(t))  
        print type(t)  
        cash_per_sid = context.portfolio.starting_cash / context.N  
        #print sig[cur_sid]  
        i = sig[cur_sid].index == t  
        dti = sig[cur_sid].index[i]  
        print dti  
        print len(dti)                    # output is 1  
        #assert len(dti) == 1        # <----------- UNCOMMENT LINE will crash  
        print type(dti)  
        #print sig[cur_sid][t]       # <-------- UNCOMMENT LINE will crash  
        # buy if sig exceeds threshold.  
        if (sig[cur_sid].ix[t] > THRESH): # and not context.invested[cur_sid]:  #  raises error  
            order_in_cash(data, cur_sid, cash_per_sid)  
            context.invested[cur_sid] = True  

Thank you for any help.

Ryan

4 responses

Hi Ryan,

Thanks for the questions and example code. I ran through and patched up a few things, and stripped out a lot of the print statement to give you a simple example. After I finished, I realized all this alignment by date work isn't necessary at all if you use fetcher. We actually put the data into handle_data's parameter for you - that way your lookahead bias is capped by the quality of the underlying data. Since I had the code working, I figured I would include it anyway! The most important change is a guard on the current algo time being in the index. I'm using a new, not quite documented yet, function to get the algorithm's current time get_datetime(). I also set the timezone of the date range - pandas is actually good about comparing naive and non-naive datetimes/timestamps, but I did it for clarity.

thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Fawce,

Thanks a lot for that feedback and solution. I should have seen what was happening--I kept staring at the top of my log output, where the date was in my index, not at the bottom of it, where the problem was, as you found. Thank you for that.

I tweaked things a bit so that others can see how to play with such things and put in the skeleton of a simple exit strategy. With Fetcher, I will be able to upload things like my stop loss/profit limits, signal threshold, etc., along with my signal itself.

Within this same idea: is there an accessible lookup dictionary, such as

context.ticker_to_sid = {'AAPL': 24, 'GOOG': 26578, 'IBM': 3766, ...}  

and/or

context.sid_to_ticker = {24: 'AAPL', 26578: 'GOOG', 3766: 'IBM', ...}  

so that our code can live in both worlds (symbol and sid)? I think the answer is 'no' so here is a clarification of my question and planned workaround.

I will Fetch a signal which will be a DataFrame whose columns I would like to be tickers (AAPL, GOOG, etc.). Then I would like to take those tickers and programmatically create my portfolio (context.sids) from these tickers. From reading the other posts and docs, it seems as if this is not possible. Is that the case?

It looks like I need to be in the IDE, type in the symbols (tickers) that I have signals for so that their SIDs appear in the IDE. Then I need to use those SIDs as the names of my columns of my signal DataFrame here on my local machine before Fetch gets it. Essentially, I need to do some manual labor of typing 'sid(' over and over in the IDE, typing in my desired ticker, seeing the SID come up on the screen, and making my own local lookup table dictionary that I can use to create my DataFrames that Fetcher will eventually get. That seems to be the only way to coordinate my outside signal with Quantopian's.

Do I understand that correctly? Sorry if I've missed something obvious.

Regardless, I can deal with this and I very much appreciate the fantastic machine you have built here.

Ryan

Hi Ryan,

I'm glad you used the code!

Fetcher does symbol2sid mapping automatically. You just need to have a column named "symbol" in your source csv. You also need a column that represents the date as a string, that way each row has a symbol and a date. Then Fetcher will find the sid for the symbol "as of" the date of the same row. The help doc has more details about Fetcher's symbol2sid functionality: https://www.quantopian.com/help#overview-fetcher

have fun,
fawce

Hi Fawce--thanks again. Sorry to miss the symbol2sid functionality. Exploring now.