Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Reinventing fetch_csv() for Zipline: Pandas code to add to Yahoo data?

When it comes to backtesting more exhaustively, I thought it would be nice to automate the run of several backtests on an algo (ie 1-month runs starting on the 1st of each month over last 3 years, and using 3 different starting cash balances for each run...generating 12*3*3=108 backtests). To do this, I thought zipline would be the way to go! It doesn't have fetch_csv() support, so I set out to write it myself. I'm new to pandas, so I'm stuck on how to take the Panel returned by load_bars_from_yahoo() (the data object created in our Quantopian algos) and add some columns to it from the Dataframe returned by fetch_csv.

Here's a code skeleton for what I'm trying to do:

def fetch_csv(csv_url, pre_func, post_func, date_column, date_format, **kwargs):  
    """ Mimics Quantopian's Fetcher  
    """  
    global data  # this will pull in the data provided by load_bars_from_yahoo() from the outer scope  
    df = pd.DataFrame.from_csv(csv_url, parse_dates=[date_column], infer_datetime_format=True)  #, post_func, date_column, date_format)

    # This next line is bogus, but I'm looking for the correct syntax to join `df` with `data`  
    data.add(df)  
    return df  

For each symbol in data, there is a corresponding Dataframe indexed on a date. I would like to take the date_column values from df and insert other columns from the CSV data (now in df) into data for the corresponding dates and symbols.

I've also posted this question on StackOverflow, but echoing it here as quantopian is not a valid tag there (yet).
http://stackoverflow.com/questions/32996121/creating-fetch-csv-for-zipline-how-to-add-columns-to-load-bars-from-yahoo

Thoughts? Is this some code the Quantopian team might release?

7 responses

Wild guess:

    data = data.append(csv_data)             # creates dupes  
    data = data.groupby(data.index).sum()    # dedupe  

You can use the research environment to adjust the backtest parameters and launch them on the fly. Here are the docs: https://www.quantopian.com/research/notebooks/Tutorials%20and%20Documentation/Tutorial%20(Advanced)%20-%20Backtesting%20with%20Zipline.ipynb

Cheers,
Alisa

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks @Alisa, Thanks for the tip. I think we're still left with the same problem of merging two pandas objects:

  1. The resulting panel from get_pricing(), and
  2. The resulting dataframe from `local_csv()

I saw the pandas tutorial (thanks for including it with the research environment!), but it seems mostly to be a paste from pandas documentation. Any chance of adding an example of how to do merged data in a way (I think) most common to Q coders? Specifically, taking the data object of all historical data stock for the universe, and merging in data from fetch_csv() (or rather local_csv()). Or maybe sharing that code here? Or perhaps there's a preferred method with creating adataobject with bothlocal_csv()andget_pricing()` in the same algo?

Furthermore, It appears one can't quite just copy/paste his algo into an iPython cell, but rather needs to refactor it by doing at least these steps:

  1. Replacing fetch_csv() with local_csv()
  2. Create data manually using get_pricing() instead of using letting it happen automatically within initialize() or creating a universe with fetch_csv().

If there are other items in the refactor checklist to port a Q algo to a notebook, I'd like to know.

There are some differences between the function calls in research and the IDE. To make it easier, we created a "cheat sheet" showing common usages: https://www.quantopian.com/posts/research-cheat-sheet-easily-move-between-the-ide-and-research

We're also working on streamlining the API and making it easier to jump between the two environments.

Thanks for the additional tips. The cheatsheet is very helpful. However, the problem still persists of how to create what is normally called "data" in a normal Q algo. To clarify, the problem is when mimicking what fetch_csv() does when universe_func is used. It appears Research allows using fetcher in either Security Info or Signal mode so long as one isn't trying to set their universe by it. This was a critical piece of info I left out in my original question. I'm trying to set my universe with local_csv() and use local_csv() in Security Info mode, which implies data should have the appropriate info from the CSV merged into it.

Does anyone know how to to take the Panel returned by load_bars_from_yahoo() (the data object created in our Quantopian algos) and add some columns to it from the Dataframe returned by fetch_csv?

Hi Jason,

I have been facing similar issue and had been sitting on this for last 10 days.
Anybody has solution for this?

Anyone found a solution to this question?