Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Missing Sentiment Data?

Hi All,

I've been trying to incorporate the alpha one sentiment data into my backtests, and have noticed there appears to be a lot of missing data starting at around the end of September, 2012. I've noticed this because I started getting key errors in the index of the pipeline results when trying to pull sentiment data for some of the stocks in my universe. For what it's worth, I am running the sentiment data through the RSI Factor, if that could potentially be contributing to the issue.

Has anyone else experienced this, or know what the cause and a solution might be?

Best

9 responses

Hi Ben,

Would you mind letting us know what stocks specifically you're missing data for? There may be some securities that aren't populated with data continuously - i.e. they aren't necessarily covered by news and blog sources.

If it's more comfortable for you, you can send me a collaboration invite at [email protected] and I'd be happy to take a look personally at your issue.

Seong

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Seong,

Just sent you a collab invite. I'm working with the 10 or so largest companies in the S&P and have encountered this issue with all them, which makes me think it isn't a lack of news.

If you take a look at some of my more recent backtests, I've been logging the output from pipeline so it should be easy to see exactly what I've been encountering.

Thanks for the help

Hi Ben,

This looks like a bug we've been working on fixing where the partner data coming through the Pipeline API seems to not get updated. We're trying to figure it out and solve it.

That said, it should be noted that each stock doesn't have a value in the data set for every day. I wrote up a quick notebook examining the number of sids with a score from Accern on any given day. Each dot represents a count for a particular day.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Josh,

Thanks for the follow-up. Understood that working with this much data from this many third parties presents its challenges, and appreciate your efforts to debug. Absolutely no worries. I am looking forward to updates as they come.

Duly noted about potential days of missing data. I am hoping to work with an "RSI sentiment" so to speak using pipeline's built in RSI factor, which I assume is robust enough to handle a couple NaN's in the input data. I am working with the 50 largest components of the S&P500, so I would imagine it's rare to go 15 days without news sentiment for these companies.

Cheers,
Ben

A follow up on this bug: we've shipped a solution to production. That's the good news. So the data from partner data sets such as Accern, PsychSignal and Sentdex in these scenarios is accurate and more reliable.

The bad news: with solution brings higher volumes of data to your algorithms. This is causing some timeouts and increases the likelihood of hitting memory limits. We're working on these issues currently.

Hi Josh,

Thank you for the follow-up. I'm getting ready to deploy one of my algorithms to trade live, so the increased reliability is great to hear!

On that note, I have a couple of questions. Are live trading algorithms granted more server space to help prevent this from happening? In a backtest it's merely mildly annoying, but in a live trading algorithm it could cause actual issues.

Also, do live trading algorithms preserve state between days? I have a number of weight variables that need to be carried over from one day to the next. Does that happen in a live algo, or is it reset and launched again each day?

Thanks,
Ben

Ben, we're taking steps to try to mitigate the occurrence of the problem with partner data right now.

For your second question, that's typically what context is meant for -- passing values across functions and days.

Josh,

Is there any update on the timeouts issue? I'm looking forward to getting back to work with the 3rd party data

Hi Ben,

The timeout issue has been fixed. Let me know if you have any other questions