Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Efficient N Days Ago Factor Research and Algo Templates

I spent the last day or so working to implement a research notebook and a corresponding algo inspired by Grant's clustering notebook and David's contribution regarding a more efficient pipeline that avoids recalculating values.

In the templates, I specified arbitrary factors: two custom factors, and one fundamental factor (that I implemented as a CustomFactor to work with the code setup). I then combined the factors by scaling all features and simply adding them. In the algo specifically, I did my best to mimic Grant's method of meeting the contest criteria while using the Optimize API. There are arbitrary parameters there that you can feel free to change.

There is no hypothesis for the template (i.e. there's no reason why I chose the factors that I did or how I chose to combine them). Rather, it's a demonstration of how you can implement your own alpha factors and include previous days' factor data. It's up to you to find meaningful signals. :)

In theory, you should be able to simply edit the make_factors() function to your liking and copy the changes into the algo version. The same goes for how you combine your alpha factors.

7 responses

EDIT: See a few posts below for a slightly changed template. I adjusted the two lines that David points out in the next post to support more combinations of factor names. Basically, it should work provided your factor name doesn't have a hyphen in it.

And here's the corresponding algo, run for the same time period. Not the best returns! ;)

Hi Kyle,
Could you explain me the 2 line:

175 df['factor'] = df['alphas'].str.extract('(\w+\s*\w+)')
176 df['day'] = df['alphas'].str.extract('(\d+)').astype('int32')

Then one could used a uint8 for the days counter (but that is really saving quite nothing :-).

Thanks,
David

Hi David,

I'm using string pattern matching to dynamically pick out the factor names and dates associated with them before pivoting into the final table format.

The first line you reference would match any of these factors (as specified by the dictionary key):

factors = {  
        'Direction': Direction,  
        'mean_rev': mean_rev,  
        'fcf': fcf  
    }  

The second line picks out the days ago associated with the factor. Because the initial pipeline spits out columns named "fcf-0", "fcf-1", etc., I needed to pick out the days ago number so that the data could be aligned in-time and combined with the second pipeline.

You're right that I could have used uint8 for the days counter. I don't foresee anyone running the pipeline with more than 255 days ago of data.

Hope that helps!

I understood the fact that you was picking the name and days...

But then I don't understand the \w+\s*\w+ and \d+.

In my version, I "stupidly" rebuild the serie from by searching the '-' character and taking what before as the the factor name and what is after as the day. I made is that way to be waterproof of factor name such as momentum10, "momentum20, or worst "mom20 var2"... Ok I know space should be avoided but I like white spaces ;-).

I don't think your way is "stupid". It's probably more robust than what I'm doing here...

Regarding the \d+, that just matches the day number. In the way I've set it up, it's equivalent to just taking what's after the '-', provided there are no other hyphens in the factor name.

Regarding the \w+\s*\w+, I found that to encompass the factor names I was using. The \w+ will match all letters, numbers and '_'. The \s* is an optional space in the factor name. My setup is limited in that it cannot support more than one space in the factor name. But for control sake, I felt safer specifying something to match than taking what's before the '-' as you did. I don't think either way is necessary better, but I wanted to have a little more control/filtering. :)

Here's the updated algo with slightly different pattern matching. I updated the notebook in the original post as well.

Thanks for the details!