Hi Jamie -
Thousands of data sets? I'd be interested in your motivation, and how you'd expect individual users to use that many data sets productively. It ends up being one data set per listed company, basically. Say each data set has an teeny-tiny transient uncorrelated alpha. What would be the path to writing an algo that could get a $50M allocation? I can see how this might fit with the framework provided on https://blog.quantopian.com/a-professional-quant-equity-workflow/ and Pipeline, assuming there is enough alpha in daily data. Do you think a naive equal-weight alpha combination will work? Or will something more sophisticated be required, to do the combination?
On a related note, it sounds pretty daunting for a single Q user to sift through thousands of data sets, combining the good ones into a comprehensive, scalable algo. But say I picked one, and showed that there was a little bit of alpha there. How could I get paid, so that you could license my little gold nugget for the fund, and I could buy a sandwich?
One suggestion for a data set would be Internet health (e.g. https://www.akamai.com/us/en/solutions/intelligent-platform/visualizing-akamai/real-time-web-monitor.jsp). Daily data should be pretty easy to come by. And deriving a real-time minutely feed, I'd think, would be fairly straightforward. Even down to individual companies, it should be possible to get the data by writing a script to query site availability (e.g. Amazon, Facebook, etc.). At some level, there must be alpha in such data sets, but if someone is already doing it (almost certainly), then minutely data may not be fast enough. You might run this by Fawce though, given his do-good tone on https://www.quantopian.com/posts/phasing-out-brokerage-integrations. In all likelihood, you could be profiting off of criminal activity (but then, you are hooked up with Point 72, which has a very sketchy history, as portrayed in the book, Black Edge). By the way, if you do end up using my Internet data idea, my one-time licensing fee is $500 cash ($20 bills would be nice).
For the Q fund, would there be any way to publish data that would allow users to do the kind of algo viability analyses that presumably you can do? For example, say I'm working on writing a new algo. I'd like to know the degree to which it might be accretive, so that I know that I'm not wasting my time (and your platform resources). Presently, it is a total open-loop, time suck. Building a crowd-sourced fund without giving the crowd access to the fund as it is built would seem to be counter-productive. But alas, ironically, the whole crowd-sourced, collective, "we are all in this together" concept seems to be totally lost on you guys, in my opinion. What is your sense from the inside (we can start another thread, if you want)?