Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Indexing problem with get_pricing

Hello,
I am a student and I have been trying to run a linear regression between the "overnight returns" and the "returns of the first 30 minutes of Trading" of the day after over the period of time 2007-2017 for the SPY. This is what I wrote so far:

#overnight indexing  
start1 = '2007-01-01 15:59:00'  
end1 = '2017-01-01 15:59:00'  
SPYclosing = get_pricing('SPY', fields='price', start_date=start1, end_date=end1)

start2 = '2007-01-02 09:30:00'  
end2 = '2017-01-02 09:30:00'  
SPYinitial = get_pricing('SPY', fields='price', start_date=start2, end_date=end2)

rate_return_overnight=(SPYclosing-SPYinitial)/SPYinitial  
print rate_return_overnight  

and:

#first 30 minutes return indexing  
start3 = '2007-01-01 09:30:00'  
end3 = '2017-01-01 09:30:00'  
SPYopening = get_pricing('SPY', fields='price', start_date=start3, end_date=end3)

start4 = '2007-01-01 10:00:00'  
end4 = '2017-01-01 10:00:00'  
SPYfirst30mins = get_pricing('SPY', fields='price', start_date=start4, end_date=end4)

rate_return_first30mins=(SPYfirst30mins-SPYopening)/SPYopening  
print rate_return_overnight  

The problem is that it keeps giving me a column of zeros and when I try to use it in the linear regression, as explained in the lessons:
https://www.quantopian.com/posts/quantopian-lecture-series-linear-regression
It just doesn't work and it gives me a lot of NaNs in the summary of the results.

Can anyone help me on making me understand where my mistake is?
Thanks a lot in advance

5 responses

The issue you are experiencing is that Pandas is sometimes too smart. Pandas recognizes that the index is a datetime timestamp and will align the two dataframes by the timestamps. In the background Pandas will match up common datetimes and perform operations on those datetimes and NOT simply common positions (ie rows). So, even though you have two dataframes which are offset, pandas looks at the actual timestamp and thinks they are the same datasets just with some extra datapoints. Therefore, when subtracting the two, one will always get 0.

If you are just trying to find the 30 minute return then one could simply use the built in 'pct_change' method.

start = '2007-01-01'  
end = '2017-01-01'  
spy_price = get_pricing('SPY', fields='price', start_date=start, end_date=end, frequency='minute')  
rate_return_30_min = spy_price.pct_change(30*60, freq='s')

This will return a dataframe of rolling returns and not a single scaler value.

There is a little subtlety here. Again, Pandas is pretty smart. It understands times. So, the above code snippet takes the percent change from the price 30 minutes previous. That means during the first 30 minutes of each day there won't be a previous time (ie the markets weren't open) so those values will be NaN. However, if one really want to compare yesterdays close to todays open then Pandas can do that too. Don't specify a time frequency.

rate_return_30_ticks = spy_price.pct_change(30)

Hope that helps.

Hey Dan,
Thanks for the answer, this is the first time I see the pct_change() function, so it was really helpful!
The problem for me is still there, since when I try to get the percentage change of the first 30 minutes of trading as you showed me:

start = '2007-01-01'  
end = '2017-01-01'  
spy_price = get_pricing('SPY', fields='price', start_date=start, end_date=end)  
rate_return_30_min = spy_price.pct_change(30*60, freq='s')  

it returns me:

2007-01-03 00:00:00+00:00 NaN
2007-01-03 00:30:00+00:00 NaN
2007-01-04 00:00:00+00:00 NaN
2007-01-04 00:30:00+00:00 NaN
2007-01-05 00:00:00+00:00 NaN
2007-01-05 00:30:00+00:00 NaN
2007-01-08 00:00:00+00:00 NaN
2007-01-08 00:30:00+00:00 NaN
...

2016-12-28 00:00:00+00:00 NaN
2016-12-28 00:30:00+00:00 NaN
2016-12-29 00:00:00+00:00 NaN
2016-12-29 00:30:00+00:00 NaN
2016-12-30 00:00:00+00:00 NaN
2016-12-30 00:30:00+00:00 NaN
Name: Equity(8554 [SPY]), dtype: float64

What could be the reason for it?
Thanks in advance for the help

First problem... you need to specify "frequency='minute'" otherwise data will be daily data. That may have been your only issue? Something like this:

spy_price = get_pricing('SPY', fields='price', start_date=start, end_date=end, frequency='minute')  

Another problem (which isn't really a problem) could be just in how its displayed? You may have used the default display to show a sample of data. It just so turns out that all the displayed values are NaN. Misleading. There's percentages there. Just drop all the NaNs and you will see them.

Finally, it's easier to see whats going on (for me at least) to change the datetime index to Eastern time instead of UTC. Use the following to do that.

rate_return_30_min_et = rate_return_30_min.tz_convert('US/Eastern')

And one more, you can add brackets around the symbol to return a Pandas Dataframe rather than a Pandas Series. Dataframes and Series have a few different methods so it may be advantageous to use one over another at times. I often like using dataframes simply because they render nicer in notebooks.

See attached notebook (it's always a good idea to attach a notebook or algorithm rather than just pasting code)

Hi Dan,
First of all, thanks for the help that you have given me so far. This is my notebook with my current result

I feel like I am really really close to having it right, I just can't find a way to make my matrices match to put them into the regression.
The biggest issue comes from the fact that the "overnight" matrix has a shape of (473, ) while the "first_30minutes_of_trading" one has
a shape of (520,1).

The issue is not the number of rows, but the fact that the "overnight" matrix doesn't show the number of columns at all.
What should I do to solve the problem?

Thanks in advance,
Mattia

Hi Dan,
I have been able to make the 2 matrices fit and I got my linear regression result here:
https://www.quantopian.com/posts/overnight-returns-vs-first-30-minutes-of-trading-returns-linear-regression
Thanks a lot for all the help that you have given me with this notebook!
Mattia