XBRL is an xml format for SEC filings. The consortium that manages the standard is running a contest for innovative XBRL data mining - $20k goes to the "best open-source" application that consumes XBRL data. From the contest site:
What kind of apps are we looking for?
Tools that rely on XBRL data from public company financial statements
and that provide highly functional, strong analytics, e.g., performing
multi-company year-to-year comparisons and ratio analysis.
.
Criteria for judging:
Improves access - enables investor stakeholder access to corporate data
Usability - application quality, usability, and accessibility
Analytics - provides minimum multi-company comparison, year-to-year
comparisons, ratio analysis or other functions designed to help investor decision-making
Design - originality and creativity, cannot be drawn from an existing design
See http://xbrl.us/research/pages/challenge.aspx for details.
As we've written about previously, Zipline is our backtesting engine. We opensourced zipline at pydata NYC. So, maybe someone in the Quantopian community would like to build an XBRL datasource in zipline?
For further inspiration, check out this article in the New Scientist about mining company filings: http://www.newscientist.com/article/mg21628896.000-mine-your-language-software-decodes-company-reports.html
here is a snippet:
Text-mining techniques generally concentrate on single words: counting
the number of negative or positive words in a body of text can give an
indication of the overall tone, for example. But it is impossible to
say whether certain words taken in isolation - such as "increased" -
are positive or negative, says team member Yuan-Chen Chang. So the
team designed an algorithm to recognise meaningful phrases instead
(arxiv.org/abs/1210.3865).