Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Help with fetcher and filling NAN values

EDIT: So.... in trying to sort out the issue posted below I ended up confusing myself more and posted a number of things below this first post that I have since resolved. The issue mentioned in this first post, i.e. the fill of NAN values seeming to be forward filled instead of 0, is something I'm still trying to fix.

Hi All-

I have a log of trades that I want to execute through Quantopian's backtester - https://dl.dropboxusercontent.com/s/xff9qws4ciidsuo/20130821_TradeLog_v2.csv

So the trade log provides the ticker symbol, the date of the trade and the number of shares. It would appear, based on the backtest below, that the algo buys (for example) the 272 shares specific on 04/05/2002 but then it continues to buy 272 share blocks! I want it to do nothing until the next trade is supposed to be executed (note that the trades in this trade log are not sorted by date but I can do that if that would help).

Any ideas?

10 responses

EDIT: This problem is that the default testing period for the back test does not correspond to the period I had been previously using. So the only issue I have now is the fillna(0) seeming to forward fill.

I tried to simplify the code by stripping out the majority of the trades and all but one of the securities and now it doesn't seem to be reading the .csv at all! Anyone have any ideas?

EDIT: This problem is that the default testing period for the back test does not correspond to the period I had been previously using. So the only issue I have now is the fillna(0) seeming to forward fill.

Heres the simpler backtest.

EDIT: The issue below was already handled here: https://quantopian.com/posts/fetcher-problems

Hmm.... So it looked like fetcher was working fine on the algo below
but now I'm seeing the following in the logs:

  • 2002-04-04handle_data:113DEBUGWARNING: rank not found for ticker Security(19657 [XLI]) 2002-07-02handle_data:113DEBUGWARNING: rank not
    found for ticker Security(19657 [XLI])
    2002-10-01handle_data:113DEBUGWARNING: rank not found for ticker
    Security(19657 [XLI]) 2002-12-31handle_data:113DEBUGWARNING: rank not
    found for ticker Security(19657 [XLI])
    2003-04-02handle_data:113DEBUGWARNING: rank not found for ticker
    Security(19657 [XLI]) 2003-07-02handle_data:113DEBUGWARNING: rank not
    found for ticker Security(19657 [XLI]) This backtest didn't generate
    any logs.

So 1) for ticker XLI it appears that the fetcher did not gather data,
but it gathered data for the other symbols?? and 2) I get the "This
backtest didn't generate any logs." error.

Am I doing something wrong with these algos above or is it possible
fetcher is a little buggy?

So the first three posts were one issue I was having with fetch_csv and the last two posts were another issue that I thought was related. The issue in the last two posts was a dumb mistake on my part, realized by John Fawcett (https://quantopian.com/posts/fetcher-problems). The first issue is outstanding and I'm not sure what is going on there.

Hi Daniel,

tl;dr -- What you are seeing is intended behavior for fetcher, which wasn't intended to ingest raw orders. The core issue is your source data has indexing collisions. However, you can adapt your source data to produce the behavior you want.

Some background - the idea for fetcher is to pull in prices, signals, and other values. The data pulled in is translated to a series of events, such that each row in a csv will be a distinct event. The data parameter sent to handle_data always has the most recent values received -- effectively forward filling.

The confusing bit is that the data you are fetching is used to create the index on the dataframe fed into the simulation. Because the rows have sparse dates, the dataframe itself will have the same sparse dates as its index, and there will be no NaNs for "missing" days.

My initial thought to fix the problem was to massage the dataframe to have a complete index of trading days, letting pandas fill in missing values with NaNs.

This requires some advanced pandas programming. To create the "missing" days, you need the dataframe produced by fetcher to have all the trading days, not just the days where you generated orders. My strategy was to re-index the dataframe with the datetimeindex produced from the zipline trading calendar.

This is complicated firstly because the index in the dataframe is not unique. Since you have orders for multiple securities on each day, there are multiple entries in the index for each day. Look for 7/8/2002 in your source data, for example, there are four orders. The format of the data in the csv is known as "record" or "stacked". Pandas includes a method called pivot, which can reshape the data (more info here: http://pandas.pydata.org/pandas-docs/stable/reshaping.html). This code should make a column for each symbol, create a unique index for the date, and make NShares the value of each cell:

pivot = df.pivot(columns="symbol", index="Date", values="Nshares")  

However, this call fails with your source csv, because there are some days that have multiple orders for the same symbol (e.g. XLK has a buy and sell on 7/8/2002), and pandas needs to be able to create a unique index of date,symbol to do the pivot. If you plan to place multiple orders per day, your date field needs more resolution, to avoid index collisions.

Given the issues with reshaping the data, it occurred to me that feeding orders to your algorithm this way conflicts with "expected" use cases for Fetcher, and the best resolution might be to just reformat your input data. My advice is to change your source csv to have "target_position_size" in number of shares for a target position, instead of order amounts. That way you can compare your current position in the sid to the target position from the csv, and place orders accordingly.

thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi John -

A little background on what I'm attempting here.... I been developing a backtester, similar to Quantopian, since November of last year. Of course, being just one person, there are a number of features and things that I have yet to include - not the least of which is the potential to interface with a trading platform. So I was excited to shift from working on creating the infrastructure of a backtester to using Quantopian and actually working on algorithmic models. However, I've been observing discrepancies between the two backtesters and been trying to resolve them. Some issues have involved the benchmark (https://quantopian.com/posts/question-regarding-the-benchmark), some involve price differences in the daily closing prices (https://quantopian.com/posts/problems-with-data-feeds-prices), and some are just specific to assumptions made within the backtester ( https://www.quantopian.com/posts/sector-rotation-strategy-dot-dot-dot ).

So far I have not found anything "wrong" in either backtester and, as I resolve the differences between the two, the answers seem to converge. The final step is to essentially have one backtester read the other backtester's log of trades and vice versa, to verify the results are nearly exact.

As far as the issue with the fetcher, I think I understand... My assumption was that "fetched" data was somehow appended to the data block with all the security information and that if dates were missing then they would become NAN. I guess that assumption was wrong....

As a hypothetical, if I called a string of data that had "signal" information, let's say it is not specific to a single security, and all dates within the backtest were present except for a few, would the missing few be automatically forward filled? Is there any way to NOT forward fill those missing dates when backtesting?

As far as the "unique" combination of date and symbol I can adjust my in-house backtester log to handle this. No problem. Thanks for looking into this.

Done. Thanks John.

Hello fawce,

I'm trying to use df.pivot() as described above on a CSV imported via fetcher. The pivot is:

MDLZ    15/02/2008  23.50064225  
SNY     15/02/2008  1.1108326  

to

                 MDLZ                   SNY  
15/02/2008       23.50064225            1.1108326  

and I can get as far as a DataFrame that seems to be correct. What I can't do is do anything with it. I'm creating the DataFrame using 'fetcher' pre-func but if I try to return it I get:
KeyError: no item named date

This is the code so far:

from datetime import datetime

def get_data(df):  
    myPivot = df  
    # Convert mm/dd/yyy to yyyy-mm-dd  
    for i in range (0, len(df['date'])):  
       myPivot['date'].ix[i] = str(datetime.strptime(str(df['date'].ix[i]), '%m/%d/%Y').date())  
    myPivot = myPivot.pivot(index="date", columns="symbol", values="change")  
    print myPivot  
    return myPivot

def initialize(context):  
    context.sid = sid(2)  
    fetch_csv('https://raw.github.com/pcawthron/StockData/master/berkshire_new_indexed.csv',  
               symbol='sec',  
               date_column='date',  
               date_format='%m/%d/%Y',  
               pre_func=get_data)

def handle_data(context, data):  
    return  

Any ideas, please?

P.

It wanted a 'date' column so I gave it a 'date' column. I now get the pivot rows in handle_data but how do I access the prices? This is the log output:

2002-02-15 PRINT SIDData({'COP': nan, 'STRZA': nan, 'WLP': nan, 'NEE': 46.78793461, 'DVA': nan, ...  

P.