Hey everyone,
I've been working on a demonstration of multiple data sources for research. It uses Andrew Campbell's Factor Tear Sheet to analyze a Pipeline factor that combines multiple data sources. In this case, I chose to use Accern and Sentdex, as they both measure news sentiment, albeit by different methods. Skip to the cell titled "1-Day News Sentiment Tear Sheet" to see the relevant tear sheets
First, the notebook performs an analysis on either data source as a pipeline output. My findings show either a low-to-no correlation to returns over the entire equities market. However, the factor tear sheet shows which sectors a specific pipeline factor can reasonably predict a change.
As an aside, take a look at the both factors' predictability of the financial sector. It's no surprise how little the news can predict a change of stock price in the financial sector.
Next, I combine article sentiment from Accern and news sentiment from Sentdex to create a Pipeline factor. Both of these datasets are available from the Quantopian Data store. In the combination of both news sources, I found a stronger IC in certain sectors.
Others are welcome to use this as a starting point for research on data sources. I'll be looking at incorporating PsychSignal's sentiment index from Twitter into the next analysis and report my findings here.