I decided have another look at the Sentdex dataset.
I was wondering why it only contains the sentiment rating field and not the "Volume of Mentions" field. The latter could be very useful either on its own or in combination. Anybody know why this is not included in the Quantopian dataset? They have it on their website.
Anyhow, I tried reproducing this example strategy:
http://sentdex.com/blog/back-testing-sentdex-sentiment-analysis-signals-for-stocks
It's a basic wisdom-of-the-crowds concept ("wisdom of the press?" haha yeah right). I made some minor changes, but I'm not sure how they got their backtest to out-perform, except by luck. Even with slippage disabled mine only underperforms, which is what I would expect actually. Business/financial press generally seems to be reactionary rather than insightful.
I'm curious, has anybody found anything interesting in this dataset?