Help needed - Self-service data -- how to use ?

Back to Community

edited

So, I've uploaded the data, all checks out, clicked on the name, placed the example in a new algorithm and whoa:
Runtime exception: ImportError: No module named 'quantopian.research'

from quantopian.pipeline import Pipeline  
from quantopian.research import run_pipeline  
from quantopian.pipeline.data.user_5f38f7672471750011004615 import spy_test2

# Date,Symbol,Open,High,Low,Close,AdjClose,Volume

pipe = Pipeline(  
    columns={  
        'volume': spy_test2.volume.latest,  
        'close': spy_test2.close.latest,  
        'low': spy_test2.low.latest,  
        'high': spy_test2.high.latest,  
        'open': spy_test2.open_.latest,  
        'date': spy_test2.asof_date.latest  
    },  
    screen=spy_test2.open_.latest.notnull()  
)

df = run_pipeline(pipe, '2002-01-02', '2020-08-06')  
df.head()

Managed to get the above example running @ notebook, noticed that the values for each date are actually a day ahead, for example:
close date high low open volume
2002-01-03 00:00:00+00:00 Equity(8554 [SPY]) 115.529999 2002-01-02 115.750000 113.809998 115.110001 18651900.0
The initial date is 2002-01-03 00:00:00+00:00, but the values are for 2002-01-02 (when comparing against the csv data)

So, how do we use/load our own data in an algorithm in place of:
context.ticker = sid(xxxx) ?

Tried a few examples but none worked, reballance function expects a context.spy, but I can't find how to "add" it from the custom dataset.

import datetime  
import pandas as pd  
from quantopian.pipeline import Pipeline  
from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline.data.user_5f38f7672471750011004615 import spy_test2

def my_pipeline(context):  
    pipe = Pipeline()  
    return Pipeline(  
        columns={  
            'volume': spy_test2.volume.latest,  
            'close': spy_test2.close.latest,  
            'low': spy_test2.low.latest,  
            'high': spy_test2.high.latest,  
            'open': spy_test2.open_.latest,  
        },  
    )  
    return pipe


def before_trading_start(context, data):  
    context.output = pipeline_output('my_pipeline').dropna()  
    df = pd.DataFrame(context.output)  
    df = df.reset_index()  
    #print(df)  


def initialize(context):  
    context.FirstDateOfBacktest = get_environment('start').date()  
    attach_pipeline(my_pipeline(context), 'my_pipeline')

    #context.spy = sid(8554)    # Gets quantopian SPY values  
    #context.spy = pipeline_output(my_pipeline)    # Doesnt work  
    #set_benchmark(sid(8554))  
    schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open())


def my_rebalance(context, data):  
    #context.spy = pipeline_output('my_pipeline').dropna()    # Doesnt work

    # Print the current values from the custom dataset  
    p_open = data.history(context.spy, 'open', 1, '1d')  
    high = data.history(context.spy, 'high', 1, '1d')  
    low = data.history(context.spy, 'low', 1, '1d')  
    close = data.history(context.spy, 'close', 1, '1d')  
    volume = data.history(context.spy, 'volume', 1, '1d')  
    log.info('%.6f,%.6f,%.6f,%.6f,%d' % (p_open, high, low, close, volume))

Also tried using the fetcher, no luck.
The data in the csv is in the following format:
date,symbol,open,high,low,close,adjclose,volume
1993-01-29,SPY,43.968750,43.968750,43.750000,43.937500,26.184059,1003200

Supposedly fetcher would "internally map" the SPY to symbol('SPY'), however, it doesn't seem so. The code bellow does not get the data from my CSV, but rather the data from the default quantopian SPY, which is something I don't want :)

    fetch_csv('https://www.mydomain.com/spy-to-q.csv',  
               date_column = 'date',  
               date_format = '%Y-%m-%d', timezone='UTC')  
    context.spy = symbol('SPY')

So... again.. how do I access my "custom SPY" data within the algo ?

8 responses

Dobri Dobrev

Anyone ? Is what I need even possible or ?

Dan Whitnable

Self serve data will only add fields (ie columns in a pipeline defined dataframe) to assets which already exist in the Quantopian database. So, if you think of a pipeline dataframe as rows of assets, and columns of data associated with those assets, then one can add columns of data to associate with existing assets but one can never add rows (ie new assets). Additionally, self serve data will never overwrite any existing fields. It simply appends new data. In the case above, self serve will not overwrite the system values for pricing. Moreover, all the order and pricing methods ONLY look at the system values for price and volume. One cannot re-direct zipline to look at your custom self serve values for these.

So, if one is trying to overwrite or remap price and volume data to self serve values, it's not possible. One can of course read any self serve values and act on them, just like any pipeline data. Once a self serve dataset is loaded, it is accessed exactly like any other built in pipeline dataset. One doesn't use data.history to fetch these values, one uses pipeline (as if it were fundamental data). The docs have some info on how this is done (see https://www.quantopian.com/docs/user-guide/tools/self-serve#accessing-your-custom-data).

The notebook code in the original post defines a pipeline which has columns for volume, close, low, high, and open based on values inputted via self serve. If SPY is the only asset uploaded in the CSV file then all other assets will have nan for these values. The date isn't really a 'day ahead'. The way to look at all pipeline output is that it's what would have been available before market open. The data one enters for today would be available before market open tomorrow. This is to prevent any lookahead bias and better reflect the actual information a trader could have had.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Dobri Dobrev

So, essentially I cannot use the custom data to do a "real" test.
Good to know.

Dobri Dobrev

Also, why does Yahoo SPY dataset differs from your own?

Dobri Dobrev

And how am I supposed to compare a backtest done on any other platform with the one in quantopian, when I cannot use your dataset and cannot upload my own ?

Dan Whitnable

Could I ask what the goal is? Are you trying to compare the results from a Quantopian backtest to one done on a different platform to gain a comfort level with the results? There may be several approaches.

My very first go to spot to verify an algo is doing what I expect is look at the transactions generated by the backtest. This is a list of buys and sells, with the price and minute they were executed, for every order filled by the backtest. The 'unit price' in this list includes commissions and slippage so it's often helpful, to better see what's going on, to set these to zero in the algo at first. The prices will then be the close price as of the minute which the order was executed. This can be verified by fetching the minute prices for a specific day in a separate notebook (run get_pricing for the one day in question to ensure the prices are not being adjusted). The minute prices will match the transaction prices. The transactions can be found by going to a backtest, click on the 'Activity' tab, then click on 'Transactions' tab. You will need to then click 'load transactions'. A tip... there is a sort of bug in this display and the transactions aren't always sorted by time and cannot be filtered. A 'better' transaction view can be found by using the old backtest display. Simply append '/old' to the URL for a backtest and that old display shows up. Click 'transactions' in that display to get a filterable sortable list of transactions. It's the same transactions just displayed differently. One can also fetch the output of any backtest, including all the transactions, from a notebook. See this post and then the documentation for info on how that's done.

In a similar fashion, one can simply verify the positions held by the algo at the end of each day by viewing the 'positions'. Go a backtest, click on the 'Activity' tab then click on 'Positions' tab. You will need to then click 'load positions'. The same list can also be fetched in a notebook.

A few comments on prices. Yahoo prices are typically 'adjusted' as of the current day and do not reflect what a security actually traded for on a given day. Also, most of their prices do not take dividends into account. The Quantopian backtester uses the prices which a stock actually traded at for each day. It also adds dividends to the portfolio as they are paid. Secondly, Yahoo, and many other sources, list open and close prices. These are the prices from the open and close auctions and not 'market order' prices. In order to have actually gotten these prices, one would have needed to place a special Market-On-Open or Market-On-Close order (ie MOO MOC). The prices on Quantopian only include 'market order' prices. These are the prices of all trades executed as market orders during trading hours. This is why the daily open and close prices on Quantopian do not always match exactly with other sources including Yahoo. There is a bit of explanation of that in these posts here and here and here.

A simple way to get external data into a Quantopian notebook is to use the local_csv function (check out the documentation here). This loads an arbitrary csv file into a dataframe and could be used to import backtest results from another platform. These can then be compared to the results from a Quantopian backtest within a notebook. This can be explained in more detail if this is an option.

Finally, there is a tool called pyfolio. This can be used to generate basic backtests outside of the backtest IDE. The results do not include commissions, slippage, or dividends, and only reflect daily, not intra-day, pricing (a lot of caveats there) but it will quickly generate high level backtest results. The benefit is that external pricing can be used. One can use either self serve data or data loaded via local_csv for prices. There is documentation on pyfolio here. It's not explicitly stated how to use ones own pricing data but, if that's a direction to pursue, it can be explained in more detail. If external prices are being used it's very important to understand if those prices are adjusted and whether they include dividends or not and, if so, the as-of date for any adjustments.

So, a lot of options. It may be best to first define what the goal is.

Disclaimer

Dobri Dobrev

Dan, thank you for explaining these things, I'll look into the provided resources.

The idea here was to understand why with the exact same algorithm, I get vastly different results when testing in Backtrader.

Initially I thought the provided dataset was the issue, however, I later ended up with the way commissions are calculated, since when I disable commissions on both tests, I get very a similar result. In relation to this, I've opened https://www.quantopian.com/posts/trying-to-track-commissions-expenses

Dan Whitnable

Good catch. I am not familiar with Backtrader, but often backtest engines rely on adjusted price data so they don't need to apply corporate actions (eg stock splits and dividends) each day during a simulation. Quantopian uses actual, un-adjusted price data for trades and then applies any stock splits and dividends after that as they would have happened. This requires two data feeds. One for prices and one for corporate actions. The added complexity of this second corporate action data feed is why many backtesters don't do this.

Let's look at several ways that inaccuracies are created if a backtester uses adjusted, and not actual, prices. Assume we have a stock called AAPL who just so happened to have a 7:1 stock split on June 9, 2014 and sold for $645 per share before the split. Now, if one wanted to invest $10,000 in AAPL on June 6, 2014 (the Friday before the split) they could buy 15 shares for a total cost of $9,675. Assuming $0.01 per share commission they would have paid $.15 in commissions. There would be $324.85 left as cash in the account.

That is how the Quantopian backtester would simulate the transaction. Now, what if adjusted prices were used? Let's assume we have prices which are just adjusted for that one 7:1 split. The 'adjusted price' on June 6, 2014 would be $92.14. If one wanted to invest $10,000 they could buy 108 shares for a total cost of $9,951.12. Assuming $0.01 per share commission they would have paid $1.08 in commissions. There would be $47.80 left as cash in the account. The commission and cash are both incorrect and not really a reflection on what would have happened in real life. Moreover, adjusted prices typically also include dividends which would drive the 'adjusted price' down and make commissions look even higher.

Another issue with using adjusted prices is with price or volume rules in ones algo. Take a simple example rule price < 100. In real life, on June 6, 2014 one would not have purchased AAPL because it was trading around $645. The Quantopian backtest engine uses actual prices and therefore would also not purchase AAPL in a backtest. However, if adjusted prices were used, then AAPL would have been purchased.

Not sure this is the issue with Backtrader but it is something to consider.

Good luck.

Disclaimer

You've successfully submitted a support ticket.

Our support team will be in touch soon.