Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Sentiment Analysis Algo

Hi everyone,

I've written my first algo and I'd love some feedback on how to make it more robust. This is really only proof-of-concept. This algo is interesting because it has so little to do with previous data, which makes it difficult to backtest.

Here's a little bit about what I did:
This algorithm is grounded in the idea that news drives markets. So what I did is used news to predict the immediate future market movements. To do this, I used an API that queries a string and returns a probability vector of sentiments (Check it out: http://text-processing.com/docs/sentiment.html). Very often, these sentiments would line up close to each other and that simply means we don't act on the data-- why act if you're not sure?
So how are we sure? We find a significant difference between the positive and negative sentiment components. I arbitrarily chose 0.2 as the difference cap, meaning we need the absolute difference between the positive and negative values to be at least 0.2. There's a caveat brought up by the other factor, neutrality. If the neutral component is very high, I consider that "too noisy" and chose not to act on the data in a similar fashion. The cap for neutral is at most 0.75.
Once we've found what we want to trade on, we trade on it at the rate of cash we have multiplied by the difference. This lets more significant differences have a bigger representations, and since negative differences are also possible, we short those.
After a few hacky techniques, violia: it's runnable.

So what's next?
I have a ticker parser ready to accept an article. It would be wonderful to get data from the web, but it's a steep challenge. Also, selling after a week would be prudent to implement, along with limiting the frequency of the algorithm.

Let me know what you think, and of course, happy hacking!

3 responses

Hi Ari,

Your algo looks sound, and of course i know you'll make it dynamic vs. hard-coded over time.

My only concern would be line 36. While you're ensuring not to over-leverage (nice),
the holdings balance per security is not only a function of the sentiment rating,
but also of the sequence of the symbols.

In an extreme case, say your first symbol has a 80% bullish sentiment.
You immediately allocate 80% of your cash to that one symbol in the for loop.
(Unless there's something pythonic that I'm missing.)

Lots of ways to equalize this to fix, but that's what jumped out at me.

Best wishes.

You could also setup a small VPS, and scrape (for example) http://stocktwits.com/symbol/AAPL which does their own sentiment analysis per "Twit" or "Tweet" as noted by the green "Bullish" boxes and red "Bearish" box at the end of each message.

Then have fetcher bring in the bear / bull / neutral stats by stock via your VPS.

Could be a workaround, food for thought.

I'm setting up an AWS instance right now, actually! It's going to be a Django app which queries the sentiment analysis engine and writes all the info into a CSV so we can use Fetcher. Thanks for the tips!