Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Build Alpha Factors with Cointegrated Pairs

Today, I would like to share a research notebook that includes a couple of examples to build alpha factors with cointegrated pairs. The pairs trading strategy has been around for a long time, and Quantopian has lectures to introduce the idea and how to implement it. Here, we want to introduce a different way to use cointegrated pairs. The basic idea is that if some event happens to one leg of the pair, like an earnings announcement that beats estimates, it is more likely that the market would price a higher chance of the same thing will happen to the other leg too, so in this case, the other leg would also beat estimates.

The attached notebook illustrates this idea by importing a set of cointegrated pairs via the self serve data tool and using the predetermined set of pairs to build two example factors. Use them as a guide and a starting point, but we encourage you to use your creativity to come up with novel ideas and share the tearsheets below in this thread.

About the self serve dataset, first off, we looked through the stocks in the Quantopian tradable Universe (QTU) to see if any of them are cointegrated. We looked for cointegrated pairs by running the cointegration test for the assets in the QTU every month starting from 2012-01-04 to 2019-05-29 (the methodology is mainly based on the lecture, Introduction to Pairs Trading). We then converted the list of pairs into a Q self-serve data format for you to use. To learn how to use self-serve data, please refer to Upload Your Custom Datasets and Signals with Self-Serve Data and Analyzing a Signal and Creating a Contest Algorithm with Self-Serve Data. You could use all of pairs or some of the pairs we provide to generate your alpha factors. Of course, you could generate the pairs with your own method and use them as input data.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

14 responses

It is convenient to move the pipeline code from the Q research environment to the algorithm environment. Please use this algo as a reference for you to design your own algo.

@Rene, thanks for this...interesting!!

In trying to run-it-without-thinking, I get the error:

# from quantopian.pipeline.data.user_[user_ID] import [dataset name]  
from quantopian.pipeline.data.user_57e2b12557e9c947ce001019 import pairs_self_serve_dataset  
ImportErrorTraceback (most recent call last)  
<ipython-input-4-cdb381e4d13e> in <module>()  
      1 # from quantopian.pipeline.data.user_[user_ID] import [dataset name]  
----> 2 from quantopian.pipeline.data.user_57e2b12557e9c947ce001019 import pairs_self_serve_dataset

/build/src/qexec_repo/qexec/algo/safety.py in __call__(self, name, globals, locals, fromlist, level)
    265         # this is a whitelisted import  
    266         return self._import_safety.make_safe(  
--> 267             self._import(name, globals, locals, fromlist, level),  
    268         )  
    269 

ImportError: No module named user_57e2b12557e9c947ce001019  

THis probably has to do with my account not having a copy of the pairs_self_serve_dataset.csv file ?
Any pointers will help.
Thanks!
alan

@Alan To upload the pairs_self_serve_dataset.csv file, please navigate to the Data tab on your account page and click "Add Dataset". You may want to refer the step 1 (Upload the Daily Lists of Pairs to Q Platform) in the notebook for selecting the Primary Date and Primary Asset fields, and declare the data types of the other fields. After the dataset is uploaded, it will have a corresponding information page. You could find your user_id there.

Please feel free to let me know if you have any further questions!

@Rene,
Are you going to supply the file pairs_self_serve_dataset.csv on Google Drive or DropBox, or are we to create it? Thanks,
alan

@Alan,
It looks like it's linked under the line "a set of cointegrated pairs" in the original post.

@Kyle,
Ahh...ok...mystery solved!
Thanks!
alan

Just wondering if anyone played around with this and either ran into problems or came up with something interesting we might want to license!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Rene,
Thanks for this post!
I believe there is lots of play in cthe cointegrated spce, yet, it's hard to understand, so please bear with my questions.

I instrumented your backtest a bit and changed it so that it goes to "Cash" if there are not enough pairs to play with.

The main question I have is how the long/short allocation of a pair works...they are out of balance(see Activity-ShortLever-LongLever traces).

A secondary question relates to how to create the "Pairs" .csv file. I'm assuming that you are using something like J.Larkin's co-integrated clustering method published previously in this form.

I've included the backtest that shows the best of what I've been able to see over the past two-year period.
alan

@Alan Thanks for the questions! About question (1), the number of daily long positions may not equal to the number of daily short positions. To simplify the problem, let us just consider one pair (A, B) and set the earnings data to be fresh if it's not more than 1 day old. Assume stock A and stock B have their earnings announcements on different dates. After stock A releasing its earnings, we compute its earnings surprise and assign this value to stock B as stock B’s score (i.e. factor value). So, only B here is assigned with a score. If A’s earnings surprise is positive, B has a positive score and we would like long B. If A’s earnings surprise is negative, B has a negative score and we would like short B. Therefore, the number of daily long positions is not necessarily equal to the number of daily short positions. About Question 2, the way I used to generate the list of pairs is based on this lecture (Introduction to Pairs Trading).

How did you produce the CSV/DataFrame with the pairs encoded like in the OP? If I have a list of pairs, is there a quick way to encode it like so in pandas or did you do it manually?

@Robert Rose I wrote a notebook for you to show one method to encode the pairs into the self-serve format with a simple example. Please use it as a reference and check if it actually makes sense. If I may ask, what is your use case? How are you generating pairs now?

Is anybody else having trouble uploading the data? When I set trade_date to be primary date, I get this error

Primary date contains datetime values.

When I set trade_date_minus_one to be primary date the upload works but then when it's processed it fails with

TypeError('data type not understood',)

Not sure if this is an error on Quantopians end or if there is something wrong with the dates columns, I didn't modify the dataset

@ Robin Gane-McCalla The format of Primary Date has to be date instead of datetime.

When uploading pairs_self_serve_dataset.csv as a self-serve dataset, please set trade_date_minus_one as Primary Date and ticker as Primary Asset. The column types of trade_date_minus_one, trade_date, ticker, and group_i are date, datetime, string, and number

Thanks so much for the quick response - my issue was that I wasn't distinguishing between date and datetime.