Help: Notebook vs Algorithm

Back to Community

posted Jun 2, 2017

Hi,

I'm new to Quantopian, gotten pretty good with working with algos, but struggling a little bit in the notebook, I'm hoping to get some help! What I'd like to do is get the RSI data for a particular stock over the past X years, then iterate through the RSI data. Below is the code that shows what i'm doing it in the algos, and I believe this is correct (it prints values to log).

Algo code:

import talib

def initialize(context):  
    context.stock = symbol('LMT')  
    context.x = 0  
    schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(minutes=5))  
def my_rebalance(context,data):  
    prices = data.history(context.stock, "close", 30, "1d")  
    RSI2 = talib.RSI(prices, 2)  
    context.x+=1  
    print(context.x, RSI2[-1])

My issue is that in the notebook (code below), the numbers don't match up. I understand that there is a warm up period at the start of the rsi period, but in my case, the numbers aren't even close! I think there is something wrong with the way I am doing it in the notebook, what is the correct way for me to do this?

Notebook code:

import talib  
sec = 'LMT'  
sec_prices = get_pricing(sec, start_date='2011-01-03', end_date='2017-01-03', symbol_reference_date=None, frequency='daily', fields='close_price')  
RSI2 = talib.RSI(sec_prices, 2)  
for i in range(len(RSI2)-1):  
    print(i, RSI2[i])

Thank you.

14 responses

Dan Whitnable

Jun 2, 2017

Not sure if this is the entire problem, but the 'data.history' method in the algo returns the last minute close of today and then the close for the previous 29 days. However, the notebook 'get_pricing' method simply gets the daily close prices. Another issue with the 'data.history' method is that it will return NaN for the todays data if no trades were made during the previous bar. This wouldn't usually have a huge impact but since you are only using an RSI window of 2, that last data point is half your data.

To make the two 'interchangeable' I'd suggest implementing this with pipeline instead of the 'data.history method'. See the attached notebook. You may need to change the 'symbols' method when porting it to an algorithm. Otherwise, the exact code will run in both environments.

MA K

Jun 2, 2017

Hi Dan,

Thanks very much for your reply. I will take a look at your notebook.

In the algo (using data.history), I was doing the backtest from 01-03-2011 to 01-03-2017. Since this is in the past, I assume the 'last minute close of today' would be the close of the day (since all days are past/complete), so I don't think that should be a problem.

Also, this: "Another issue with the 'data.history' method is that it will return NaN for the todays data if no trades were made during the previous bar." is a little scary. Can you explain this further? Why is it skipping data? One would imagine that it would give you data for all days, as you are requesting. Not only providing data 'if you traded on the previous bar.

MA K

Jun 2, 2017

I understand what you mean about data.history getting the last minute close of today, since I am getting this data at 5 minutes past the open. I will drop the most recent data point, so I am getting the closes from the previous full days. I guess before I try compare the RSI, I will make sure the price data is consistent!

MA K

Jun 2, 2017

Doing some more looking into this and what I'm finding is not very encouraging.

In the notebook, using get_pricing, I am getting differing price data for the same days simply by changing the 'end date' parameter??? That is scary. How can we trust any of our analysis if we can't trust the data?

Attaching the notebook, but here is an example of what I mean:

sec = 'LMT'  
sec_prices = get_pricing(sec, start_date='2011-01-03', end_date='2011-02-10', symbol_reference_date=None, frequency='daily', fields='close_price')  
print(sec_prices)

Output (first few lines/days):

2011-01-03 00:00:00+00:00 69.83
2011-01-04 00:00:00+00:00 70.31
2011-01-05 00:00:00+00:00 71.92
2011-01-06 00:00:00+00:00 73.18
2011-01-07 00:00:00+00:00 73.63
2011-01-10 00:00:00+00:00 73.59

Now, simply change the 'end_date' to one year later:

sec = 'LMT'  
sec_prices = get_pricing(sec, start_date='2011-01-03', end_date='2012-02-10', symbol_reference_date=None, frequency='daily', fields='close_price')  
print(sec_prices)

Output for the same days (first few lines/days), different prices!!

2011-01-03 00:00:00+00:00 66.941
2011-01-04 00:00:00+00:00 67.401
2011-01-05 00:00:00+00:00 68.944
2011-01-06 00:00:00+00:00 70.152
2011-01-07 00:00:00+00:00 70.584
2011-01-10 00:00:00+00:00 70.545

scary?

MA K

Jun 2, 2017

BTW, the first example seems to match the data I get using data.history in the algo.

Baffled as to why changing end date changes the price data

Dan Whitnable

Jun 2, 2017

@MA K

A few things. You stated "In the algo (using data.history).... Since this is in the past, I assume the 'last minute close of today' would be the close of the day (since all days are past/complete), so I don't think that should be a problem."

Yes, that is the problem. As I stated previously, the 'data.history' method, will always return the last bar data as the last value in the returned series.

hist = data.history(context.securities, 'close', 10, '1d')  
close_as_of_last_bar_in_backtest = hist.iloc[-1, 0]  
yesterdays_close = hist.iloc[-2, 0]

The last entry in the series is NOT yesterdays close. The second from the last bar is yesterdays close. See the attached algorithm and look at the logs to see this in action.

You also stated " I am getting differing price data for the same days simply by changing the 'end date' parameter"

Yes, that is true because the prices are all split adjusted as of the 'end date' parameter. The 'get_pricing' method is point-in-time split- and dividend-adjusted. For these adjustments, the reference date, or the ‘date which you are looking back from’, is always the end_date of the 'get_pricing' call. See https://www.quantopian.com/posts/research-updates-get-pricing-and-jupyter-notebook-upgrade

Regarding 'data.history' method will return NaN if no data for that minute. Take a look at the documentation (https://www.quantopian.com/help#api-data-history) for a description how only certain fields (eg 'price') are forward filled. Other fields are not. However, it seems that this may only be true when requesting minute data? Looking at the logs of the attached algorithm there aren't any NaN values for what should be a thinly traded stock.

Understanding those concepts should go along way to clearing up the confusion. Does that help?

MA K

Jun 2, 2017

Hi Dan,

Thank you so much for your explanation.

So, I solved the first problem by doing this:

hist = data.history(context.securities, 'close', 10, '1d')[:-1]

Thereby dropping the last, incomplete data point. Now, the last entry will be yesterdays close.

And I now understand why the prices are different when you use different end_date for get_pricing, thank you for the link. But I now have 2 follow up questions.

1) How can I get the RSI data as it would have been in reality, historically? Not based on adjusted price, but the actual RSI value that would have been. I think this might be done using the approach you present in your first reply. If this is the case, how would I get rsi with my custom period? I don't see any period given in your pipeline code.

2) There are a lot of algorithms floating around on Q, many of them great, that use this approach to calculate RSI and make decisions on whether to buy or sell

prices = data.history(context.stock, "close", 30, "1d")  
RSI2 = talib.RSI(prices, 2)

Is this incorrect/invalid?

MA K

Jun 2, 2017

Follow up to #2 above, for example, if there is a 7:1 price split, and data.history is just providing actual (non-adjusted) price, then surely the RSI calculated using this data would be incorrect, so the signal would be invalid?

Dan Whitnable

Jun 2, 2017

@ MA K
Regarding #1 above. RSI is a ratio with values between 0 and 100. Since its a ratio of prices it will not be affected by multiplying the prices by a constant. Therefore it won't be affected by stock splits. See the calculation here https://en.wikipedia.org/wiki/Relative_strength_index. However, in a backtest, the data that an algorithm sees on a specific date is EXACTLY the data one would have seen in real life. If AAPL was selling for $99 on a specific date then one would see $99 in the backtest on that date.

You can get a custom window length for the RSI factor like this:

# set the custom period to 2  
rsi = RSI(window_length = 2)

Regarding #2. Using a window length of two is silly (in my humble opinion) or at least overkill. There can be only 2 values for the RSI in that case, either 0 or 100. All it's really telling you, if its 100, is the current price is greater than the previous close. If its 0, then the current price is less than or equal to yesterdays close. As far as using 'data.history' and then using the 'talib' function to calculate signals, you are correct, there are a lot of algos on the forums that do that. Many of those algos pre-date the introduction of pipeline. Pipeline is generally the preferred method to get historical data now. If needed you can append current minute data to the pipeline dataframe to keep all your data in one place. There are still times when the' talib' functions and 'get.history' may be appropriate but generally start with pipeline.

Regarding #2 (previous post). The prices and volumes returned by 'data.history' ARE split adjusted (see https://www.quantopian.com/help#api-data-history). One needs to be a little careful when using absolute values in any historical lookups. I won't go into that. However, as stated in #1 above RSI is a ratio and therefore won't be effected by adjusting for splits. In fact, if prices weren't adjusted, the RSI values would be wrong because it would interpret a stock split as being a huge change in price when in reality it isn't. Make sense?

I've attached the same notebook but with a custom window length for RSI. Note the dataframe output.

MA K

Jun 2, 2017

Hi Dan,

Thank you so much for taking the time to answer my questions, I really appreciate it.

I want to clarify three points:

1) You said two things which I find conflicting, so I just want to clarify. First you said that in a backtest, the data than an algorithm sees on a specific date is exactly the data one would have seen in real life. But then you say that The prices and volumes returned by 'data.history' ARE split adjusted. So when I get price data using data.history in the algorithm, is that split adjusted (because of the later point above)? or is it the exact price one would have seen in real life (because of the former point above)?

2) If I am using data.history to get price in the algo, and this price is the actual price it was trading on that day (not split adjusted), and then I used this data to calculate the RSI using talib (as is being done in many algos), this means that all the algos using this method are in fact invalid? It may not be a huge difference, but around the times of the stock split, the algo would have been trading on invalid signals? (the answer for this question also depends on the answer to number 1 above)

3) About the values of RSI2 being either 0 or 100, please see this notebook example of calculating RSI 2 using talib. They are not only 0 or 100, they are everything between. I also just opened a charting software and pulled up some data, and threw on the RSI indicator with period 2 and I see all values between 0 and 100, not just 0 or 100. I guess I am unclear on something here.

Thanks again very much.

Jamie McCorriston

Jun 3, 2017

@MA K

I can help with 1) and 2).

All pricing data on Quantopian is split and dividend adjusted as of the current trading day. This means that the 'current' price of an equity on any given day in the backtester will be the price that it traded that day. However, if you ask for a historical window on day N in the backtester, the returned price series will be adjusted for all the dividends and splits that occurred prior to day N. We do this to avoid forward lookahead bias. We don't want to adjust prices for splits and dividends that had not yet happened at that point in time in the backtest to keep it as realistic as possible.

Hopefully this clarifies 1) and alleviates your concern in 2).

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

MA K

Jun 4, 2017

Hi Jamie,

Thank you very much for your reply. I think I understand now: The prices in the backtester will always be the price you would have seen that day, and you can get this same data in the notebook using the pipeline. I have confirmed this (the price data I get from pipeline in the notebook matches the price in the backtester). This works perfectly if I request one stock from pipeline, like the examples above. But what I want to do is get the Q500US set (and the historical price data) using pipeline, and I am having some trouble. Please see the attached notebook. In it, I am simply choosing one stock an an example, and as you can see, it is not the complete data set. There is a gap between 2011-01-31 and 2013-07-01. Why is this happening? Is it because it was a part of the Q500US until 2011-01-31 and then was not anymore until 2013-07-01?

How would I get all price data going back X years for all stocks in the Q500US today, using pipeline? (so as a result, I have full price data going back X years for exactly 500 stocks)

What I was doing was getting the list of Q500US for one particular date (by setting the start and end date to be the same in run_pipeline()), then getting the price history for each stock in the list using get_pricing, but that didn't work for me because I want the price as I would get it in the backtester, not the end date split adjusted price. So now, I'm using pipeline to get the price history, but having the problem described above.

Thank you.

Jamie McCorriston

Jun 5, 2017

Pipeline was designed to get you data according to a fixed reference point (the current date in backtesting, and the current date in run_pipeline**). It is also designed to output a single output per column per asset per day, which means you can't easily return a time series of data.

The best way to get the times series for each asset in the Q500US today would be to do as you suggested and get the constituents of the Q500 on a particular day, and pass these to get_pricing. get_pricing will return prices that are adjusted as of the end_date of the query.

I've attached a notebook that should get you what you're asking for (the notebook gets the Q500 for 2017-01-03). Let me know if this helps.

Note: The first sentence used to say that pipeline was adjusted as of the end_date. This was incorrect. Pipeline data is adjusted as of the 'current' date in the simulation or the 'current' date in run_pipeline.

Disclaimer

Thomas Chang

Jul 26, 2017

Hi all,

I have the similar problem what MAK has. If one uses an indicator which deal with ration such as RSI, maybe this has a little problem. But how about the oder indicator such as MACD?

You've successfully submitted a support ticket.

Our support team will be in touch soon.