Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Prices in Algo

Hi,

I'm having some trouble with getting prices in the backtester. I believe it may have to do with split adjusted pricing, but I thought that in the backtester, you get the prices as you would have seen them on that day.

In this example, take XIV during September 2014. It was trading at around $110. But in the algo (attached), I ask for the prices for the past five days and print them, and they are all around $27. I guess I still don't entirely understand. The backtester is running for a period in 2014, and I'm asking for the last 5 days of data each day. Shouldn't I see what I would have seen in the market on those dates?

5 responses

The prices you are seeing in the backtest (all around $27) are indeed what you would have paid cash money to buy 1 share of VXX during September 2014. Those are prices you would have seen at that time. Not sure (but I have a hunch) why you say "It was trading at around $110" because it was not.

The culprit here was a reverse 4:1 split for VXX on August 9, 2016. If today you were to look up the historical VXX price (on a site like yahoo or google) those would generally INCLUDE that split adjustment. Therefore the actual (non-adjusted) prices before that split date would all be 1/4 of $110. In other words the actual prices would be about $27.5. Voila!

Hi Dan,

Thank you very much for your explanation. You are correct about the adjusted prices. I'm having some trouble with getting data in different timeframes, I was hoping you might help me with that. I've found on the forums that you can get different timeframes (like 30 min, 60 min, 120 min, etc.) by using the resample function, but I'm not sure I understand this function correctly. I'm printing the resampled prices to the log in the algo, and comparing the prices to charts (from IB), and comparing between results between resampled data (different times) to try understand how it works, but something is not right.

I'm comparing 30 min and 120 min bars. Firstly, when I print these prices in the log and compare to IB data, it doesn't seem to match. The first and last bars match the data from IB (both 30 min and 120 min), but all the bars in between for that day do not match at all. I know the data sources might be different, but even the way the price moves is not similar. Secondly, there is always an extra bar at the end of the day (or at the beginning of the day) in the data from Q.

Secondly, when I compare the resampled prices from Q to each other(30 min vs 120 min), i'm having the same problem: first and last bar of the day match, but bars in between don't. In this below for example, I expect the 11:30 price to be the same whether it is a 30 min bar, or 120 minute bar, but they are different. (but the first and last bar prices match)

My code:

prices = data.history(stock, "close", 600, "1m").resample('120T',  closed='left')  
prices = data.history(stock, "close", 600, "1m").resample('30T',  closed='left')  

Sample results (notice first and last bar match in 30 min and 120 min, none of the others do. Also note one more bar than there should be in both cases)

30 minute:
2017-06-02 09:30:00+00:00 80.883483
2017-06-02 10:00:00+00:00 81.298267
2017-06-02 10:30:00+00:00 81.657100
2017-06-02 11:00:00+00:00 81.392533
2017-06-02 11:30:00+00:00 81.571833
2017-06-02 12:00:00+00:00 81.540767
2017-06-02 12:30:00+00:00 81.584667
2017-06-02 13:00:00+00:00 81.653700
2017-06-02 13:30:00+00:00 81.606267
2017-06-02 14:00:00+00:00 81.481733
2017-06-02 14:30:00+00:00 81.344633
2017-06-02 15:00:00+00:00 81.434533
2017-06-02 15:30:00+00:00 81.298000
2017-06-02 16:00:00+00:00 81.080000

120 minute:
2017-06-02 09:30:00+00:00 80.883483
2017-06-02 11:30:00+00:00 81.479933
2017-06-02 13:30:00+00:00 81.596350
2017-06-02 15:30:00+00:00 81.389725
2017-06-02 16:00:00+00:00 81.080000

Greatly appreciate your help.

Thank you.

Maybe take a look at this post https://www.quantopian.com/posts/how-to-use-the-resample-correctly. Play around with the attached notebook to see how the 'resample' method works. I believe you need to change your 'resample' statements to something like this.

prices = data.history(stock, "close", 600, "1m").resample('120T',  closed='right', label='right') .last()  
prices = data.history(stock, "close", 600, "1m").resample('30T',  closed='right', label='right').last()

Setting 'closed='right' ' will include the last value. The buckets times are 01-30 in the case of the 30T sample. Otherwise if set to 'left' the buckets are times 0-29. You should also explicitly add the '.last()' method if you do indeed simply want the last close from each bucket.

A word of caution.. using the 'close' price will mean that there can be NaNs in your data if a security didn't trade during that minute (ie the data isn't forward filled like 'price'). This may not be a problem for you but you do want to handle any NaN values appropriately.

Thanks Dan,

The code you provided works as I expect, and I checked out the post. I don't completely understand what .last() does. I can see that it changes the price I see for that bucket (if I print prices), but the length of prices stays the same. So if I don't use last(), what am I getting? the first minute in that bucket (for example, am I getting the price at the start of the 2 hour bar?)

Thank you.

@MA K
You stated "I don't completely understand what .last() does" Here's a little bit of an explanation...

In Pandas 0.18.0 the approach to resampling changed (see http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html). Resample doesn't actually return a value or a series of values, it returns a 'resampler' object. Similar to the groupby method. To get actual values one needs to ask that resampler object for data. It's a two step process. First make the resampler object, then ask that object to perform operations. If you want the last value of each 'bucket' of the resampler simply append the 'last()' method...

# Get some 1 minute data  
prices_1m = get_pricing('AAPL', start_date="2017-03-01", end_date="2017-03-01", fields='price', frequency='minute')  
# Get the last value of each resampled bucket  
resample_5m = prices_1m.resample('5T', closed='right', label='right').last()

You can also ask the resampler object to do other things than simply getting the last value of each bucket. You can get the sum, the mean, the min, the max, and a whole lot of other things. You can even create your own function to apply to each bucket.

Hope that makes sense. Always specify a method like 'last()', 'mean()' or something. If you don't, then you aren't exactly sure what the resampler object is giving you (though it does seem it defaults to last() )

If you are curious, you can get a list of all the current methods for a resampler. An easy way to get a list of all the available methods is to use the 'help' function in a notebook. Once you have an object instantiated (so assume you already ran the code above) simply type

help(resample_5m)

Here's an augmented list of all the available methods...
```

aggregate(self, arg, args, **kwargs)
| Apply aggregation function or functions to resampled groups, yielding
| most likely Series but in some cases DataFrame depending on the output
| of the aggregation function
|
| apply = aggregate(self, arg, *args, **kwargs)
| Apply aggregation function or functions to resampled groups, yielding
| most likely Series but in some cases DataFrame depending on the output
| of the aggregation function
|
| asfreq(self)
| return the values at the new freq,
| essentially a reindex with (no filling)
|
| backfill(self, limit=None)
| Backward fill the values
|
| bfill = backfill(self, limit=None)
| Backward fill the values
|
| count = f(self, _method='count')
| Compute count of group, excluding missing values
| ffill = pad(self, limit=None)
| Forward fill the values
|
| fillna(self, method, limit=None)
| Fill missing values
|
| first = f(self, _method='first')
| Compute first of group values
|
| interpolate(self, method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', downcast=None, **kwargs)
| Interpolate values according to different methods.
|
| max = f(self, _method='max')
| Compute max of group values
|
| mean = f(self, _method='mean')
| Compute mean of groups, excluding missing values
|
| median = f(self, _method='median')
| Compute median of groups, excluding missing values
|
| min = f(self, _method='min')
| Compute min of group values
|
| nunique = f(self, _method='nunique')
| Returns number of unique elements in the group
|
| ohlc = f(self, _method='ohlc')
| Compute sum of values, excluding missing values
| For multiple groupings, the result index will be a MultiIndex
|
| pad(self, limit=None)
| Forward fill the values
|
| prod = f(self, _method='prod')
| Compute prod of group values
|
| sem = f(self, _method='sem')
| Compute standard error of the mean of groups, excluding missing values
|
| size = f(self, _method='size')
| Compute group sizes
|
| std(self, ddof=1)
| Compute standard deviation of groups, excluding missing values
|
| sum = f(self, _method='sum')
| Compute sum of group values
| transform(self, arg, *args, *
kwargs)
| Call function producing a like-indexed Series on each group and return
| a Series with the transformed values