Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Accern Event-Driven (Earnings Focused) News and Blog Backtest Results (PDF Report Attached)

[Quantopian Update] - This algorithm is now outdated, please visit this thread to see recent examples of Accern's data with pipeline and Quantopian 2

Hello Quantopians,

We're at it again. We recently backtested over 600,000 news and blog articles related to company earnings (2.5 years length) with the help of Quantopian community members. The results are very interesting. Have a look for yourself in our backtest report below :)

The news and blog dataset is designed by Accern, a big data media analytics firm headquarter in New York City (Wall Street). We monitor over 20 million news and blog sources on the web in real-time and provide over 25+ fields for analytics designed specifically for quantitative trading. Accern currently serve some of the largest multi-billion AUM hedge funds worldwide.

The purpose of the backtest was to identify the performance of trading on earning information in real-time and also to identify which segment of earnings information generated the most returns. We found that Financial Ratings (a segment of Company Earnings) generated the most returns. We also wanted to identify the effect and performance of Acquisitions, Corporate Governance (management decisions), and Contracts (deals, partnerships) on stock prices and how well these types of events can be used in quant trading.

Our data set contains 16 Event Groups, 78 Event Types (Sub-Groups), 1000+ events, and 30K event variations.

We utilized the following events in our backtest:

  1. Company Earnings (Event Group)
  2. Financial Results (Company Earnings - Event Type)
  3. Financial Forecast (Company Earnings - Event Type)
  4. Financial Ratings (Company Earnings - Event Type)
  5. Acquisitions (M&A - Event Type)
  6. Corporate Governance (Event Group)
  7. Contracts (Event Group)

We utilized the following fields of analytics in our backtest:

Story Sentiment (-1 - 1): This metric calculated the aggregated sentiment score of a specific story.

  • A positive sentiment score meant that the story was trending positively.
  • A negative sentiment score meant that the story was trending negatively.
  • This could be used as a directional trigger.

Article Sentiment (-1 – 1): This metric calculated the sentiment score of an article which was relevant to a company.

  • A positive sentiment score meant that the article was written in a positive tone towards a company.
  • A negative sentiment score meant that the article was written in a negative tone towards a company.
  • This could be used as a directional trigger.

Event Impact Score on Entity (1-100): This metric calculated if the article would have a greater-than-1% impact on the stock on the same day.

  • A high impact score meant that the article had a high probability of affecting the stock price by more than 1%.
  • A low impact score meant that the article have a low probability of affecting the stock price by more than 1%.
  • This could be used as a decision maker to execute an order / identify critical information to trade on.

Overall Source Rank (1-10): This metric calculated the timeliness and reposting of a source; could be used as a trust or viral factor.

  • A high overall source rank meant that "source x" was usually the first at releasing articles and other sources usually reposted the same information after "source x" had posted it.
  • A lower overall source rank meant that "source x" was usually late at releasing articles than other sources and other sources usually never reposted the same information after" source x" had posted it.
  • This could be used as a trust filter to valid a story.

First Mention (TRUE/FALSE): This metric lets you know if a story hadn't been mentioned across 20 million sources within 2 weeks.

  • TRUE meant that the story hadn’t been mentioned across 20 million sources within a 2-week period.
  • FALSE meant that the story had been mentioned across 20 million sources within a 2-week period.
  • This could be used as a quick decision maker to execute an order.

The backtest report explains it in more details. Please review the report and share it with anyone you like. We are currently in the process of working with Quantopian to make our historical data available on the platform and also working on a retail Alpha Stream feed (Alpha Stream Lite) which you can live trade on.

Request access by signing up here: Alpha Stream Lite Sign Up

Here is our backtest report: Accern Event-Driven Backtest Report

CSV File containing 600,000 news and blog articles we used: Accern News/Blog Data Set

Our previous backtest (Trend-Following Strategy): Quantopian Article and PDF Backtest Report

Contact me personally with any questions: [email protected]

Best,
Kumesh Aroomoogan
Co-Founder and CEO, Accern

16 responses

Kumesh,

Awesome. Thanks for posting. One question: Have you done research to find whether or not number of story mentions (+ 20 millions) affects the returns generated by the algorithm? Or a broader question, what impact that has on the algorithm as a whole?

Seong

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you for the question, Seong. We categorize a story as a company and event pair that last for a 2-week period on average before people move on to something else. So Starbucks (company) and Product Recall (event) = 1 story. A story can have multiple articles associated with it. So in this event-driven example 'Starbucks' and 'Company Earnings' is a story and the Story Volume itself can be 100 (# of articles / mentions) after x days. Since we used Story Sentiment, we're aggregating the sentiment from the 100 articles/mentions on Starbucks and Company Earnings so the sentiment is much more stable.

To answer the question, yes the number of articles mentioned in a story can affect the returns because 1) more articles = more chances of a condition being matched (ex. source rank) 2) more articles in a story = more stable and reliably sentiment score on the story. A story can have 1 article which starts off negative but as more articles come in, their views could change and the sentiment could trend into the positive.

Thanks for the prompt reply Kumesh, I have a few more if you don't mind off of the following statement:

"A story can have 1 article which starts off negative but as more articles come in, their views could change and the sentiment could trend into the positive."

In what timescale is this kind of change observable? E.g. do you see this at the millisecond level or the minutely level?

Thanks

The change in story sentiment can take place in minutes to hours depending on how popular the event is. For example, lets say as soon as Apple released a new product, some influential blogger found a fault and mentioned in an article that the product is terrible - giving the story a negative sentiment. However, other people that received the product did not find any faults, thus reporting positive things about the product. This changes and stabilized the story sentiment from negative to positive within minutes or hours based on the popularity of the product release event.

Here is a backtest using the news and blogs data on Financial Rating events which returned 322%. Seems Financial Ratings is an excellent predictor for stock price movements.

can articles be processed in real-time if so how do we code that in? Is there a different CSV I have to use?

Hi Josh,

Each article in our dataset is timestamp to the second but in this Quantopian backtest, we round it to the nearest minute. When conducting the backtest, there is a 2 minute delay from when the article gets recognized by our trading condition to it actually executing the trade. Unless you are trading high frequency, then the 2 minute trade execution time shouldn't be a problem.

Best,
Kumesh

Cloned the top algorithm by Kumesh to run it. But, it always seems to time out with this below message:
"TimeoutException: Fetcher data not processed in 360 seconds There was a runtime error on line 127."
Can anyone help?

Is there any news on when data like this can be used in live trading?

I'm also interested when the data set will be available for backtesting and live trading. Also, the CSV provided in many of the examples is the full Accern sentiment data. It looks like only the AlphaOne stream (daily data and limited fields) will be available for live trading within Quantopian.

Hi folks,

We're actively working on making this data available for live trading. We'll let you know once we have news to share.

In backtesting, a fetcher file has 6 minutes (360 seconds) to fetch from the server and load into the algo. It sounds like you're running into this time limit. Perhaps your internet connection is slow? Or you can locally break up the file into smaller chunks for faster uploading.

Alisa

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hello,

I was looking up the premium data Accern can supply for 50$/month. But it only return you two metrics? For example the source rank is not given. I am wrong here?

Regards

Emiel

The data that is now built into Quantopian provides impact_score and sentiment, rolled up at a daily frequency for US Equities.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Emiel,

Yes, the Alpha One dataset contains only two, most commonly used metrics (impact_score and sentiment). It's a daily delivered dataset for retail customers.
Source_rank, along with several other metrics like story_saturation, social_shares etc., is part of our real-time streaming product (Alpha Stream). Alpha Stream is focused towards bigger trading teams. We don't have the support for Alpha Stream on Quantopian yet, and it can only be accessed using our API.

Regards,
Anshul
Co-founder, Accern

The first algo appears to be 187% in the chart. Adjusting for the ~$2M negative cash, the return is 68%, see PvR.
Leverage hits 3.25.

PvR beats the benchmark so don't let this sour you on the sentiment data and important work they are doing. Just -- eyes open.

Tried cloning this, but the Dropbox files posted above (backtest results and data files required for backtest) are missing.
Is there a newer version of the Accern algorithms posted, with current Dropbox files?

Also, if someone has run these algorithms over the last year, pl point me to the results - my backtests stop in May 2016 as i don't have the paid subscription.