I was quite excited about the 101 Alphas Project and the paper "101 Formulaic Alphas". So I wrote a compiler that takes an alpha equation and generates a Quantopian Pipeline factor. Attached is a notebook with 77 of the 101 alphas, see below as to why there are only 77. I have tested the following alphas on Quantopian: 1-20, 23, 33, 41, 57, and 101, I could use some help testing the others, spotting bugs, and or any useful feedback. (If you think the notion of alphas, or long-short equity is a waste of time please - save it for another thread.)
The biggest problem is correlation, 51/101 alphas contain the correlation operator and I did properly implement it, however it is slow. One month of data takes 14min to process. The real problem is even though I set the screen to 500, or 1500 equities the custom factor is still processing over 8000 equities. Perhaps I am not using the screen correctly? From what I have read this is the intended behavior. There seems to be a work-around that I may try to speed things up.
Things I didn't implement, (why there are 77 not 101 alphas):
-Run-Time Time Series: Most alphas involve some time-series operator, and for most of these time-series operators the number of days that the operator is applied is known at compile time. However there are nine alphas (71, 73, 76, 77, 82, 87, 88, 92, 96) that have a time-series operator that is not known until run-time. This could be done, however it would involve a nasty for-loop that would need to iterate over every equity. If you looked at every one of these you would notice, the dynamic operator is actually on a max() or min(), the notes say that max() = ts_max(), but I suspect in these examples max() actually behaves by returning the largest of the arguments. I actually contacted the author of the paper, but he couldn't say anything that was not already in the paper.
-IndNeutralize, I simply didn't get around to this. See Alpha#97. I think it is very valuable, any ideas?
-Fractional time-series days. There are a handful of time series operators that use a number of days that are not whole numbers like: 9.991009 See Alpha#62 as an example. For these I simply round the number to the nearest integer.
-Logical to floating point. There are a few places where a logical operation is used like a floating point vector. See Alpha#95. Here I was not sure what value to assign to False (assuming 1.0 for True) 0.0 or -1.0?
A note on the distasteful for-loops: I originally wrote the for-loops as a way to reason about how the data needed to be arranged for the proper time-series operations with the intention of later rewriting them as matrix operations. for-loops are slower than matrix operations, as the looping is done in c or Fortran in a matrix operation. However what I noticed was that these for loops are not that bad. There is never a for loop which loops over the equities, it is always days which are usually in the single digits. Even an alpha with five embeded loops like Alpha29 does not appear to take much longer than an alpha with no loops. For the time being I have no plans to remove the for loops.