I have found a potential bug in this tutorial.strong text
From the two-independent sample t-test https://en.wikipedia.org/wiki/Student's_t-test with unequal sample size and unequal variance, the degrees of freedom using the Welch–Satterthwaite approximation of degrees of freedom should be
df = ((s_spy*2/n_spy) + (s_aapl2/n_aapl))2/(((s_spy2 / n_spy)2 /(n_spy-1))+((s_aapl2 / n_aapl)*2/(n_aapl-1)))
rather than
df = ((s_spy*2/n_spy) + (s_aapl2/n_aapl))2/(((s_spy2 / n_spy)2 /n_spy)+((s_aapl2 / n_aapl)*2/n_aapl))
The test statistic and the df for the corrected and original codes are as follows from this print statement
print 't test statistic: ', test_statistic
print 'Degrees of freedom (modified): ', df
print 'p-value: ', 2 * (1 - t.cdf(test_statistic, df))
Corrected
t test statistic: 0.0346940841618
Degrees of freedom (modified): 401.343250979
p-value: 0.972340926484
Original
t test statistic: 0.0346940841618
Degrees of freedom (modified): 402.948623983
p-value: 0.972340857787
On the other hand, using stats.ttest_ind from scipy
from scipy import stats
stats.ttest_ind(returns_sample['SPY'],returns_sample['AAPL'],equal_var=False)
We have the following result:
Ttest_indResult(statistic=0.034694084161834164, pvalue=0.97234092648396842)
Notice that the p-value corresponds to the corrected version
Thank you very much and I hope to hear from you soon!
Best,
Rolando