Hi.
I have written this code to perform a cointegration test on a list of tickers in order to return a list of cointegrated assets for every ticker in the list. I have modified slightly the code in the example called "How to Build a Pairs Trading Strategy on Quantopian?" because i am not very familiar with panels, so i'd stick with dataframes for now.
The link of the example:
https://www.quantopian.com/research/notebooks/Cloned%20from%20%22How%20to%20Build%20a%20Pairs%20Trading%20Strategy%20on%20Quantopian%3F%22%202.ipynb
So i have the same couple of symbols that came out to be cointegrated as in the example and they are 'ABGB' and 'CSUN'.
But when i move on and try to visualize the zscore in a plot i have totally different graphic results although the symbols are the same and the start time and end time are the same as in the example
Here is the code:
import pandas as pd
import pandas.io.data as web
import numpy as np
import matplotlib.pyplot as plt
import statsmodels
from statsmodels.tsa.stattools import coint
pd.options.display.mpl_style = 'default'
start= '2014-1-1'
end= '2015-1-1'
ticker_list = ['ABGB', 'ASTI', 'CSUN', 'DQ', 'FSLR','SPY']
allData = {}
for ticker in ticker_list:
allData[ticker]= web.get_data_yahoo(ticker, start, end)
#just prices:
total_df = pd.DataFrame({tic:data['Close'] for tic, data in allData.iteritems()})
#percent variations:
daily_returns= total_df.pct_change()[1:]
#cumulative variations:
return_index= (1 + daily_returns).cumprod()
def cointegration_finder(ticker_list):
'''
populate the dictionary 'result' with each symbol as keys
and the list of symbols with wich it is cointegrated as values
then convert it to a dataframe for printing
'''
result= {}
for ticker in ticker_list:
compare_ticker_to_this_list = [x for x in ticker_list if x != ticker]
cointegrated_tickers = [x for x in compare_ticker_to_this_list if coint(total_df[x],total_df[ticker])[1]<.05]
result[ticker]= cointegrated_tickers
return pd.DataFrame.from_dict(result, orient='index')
df = cointegration_finder(ticker_list)
print df
def zscore(series):
return (series - series.mean()) / np.std(series)
def visualize_spread(x, y):
score, pvalue, _= coint(x, y)
diff_series= x-y
zscore(diff_series).plot()
plt.axhline(zscore(diff_series).mean(), color='black')
plt.axhline(1.0, color='red', linestyle='--')
plt.axhline(-1.0, color='green', linestyle='--')
plt.figure(figsize=(15,8))
plt.show()
visualize_spread(total_df['ABGB'], total_df['CSUN'])