1st attempt: finding co-fluctuating stocks

Back to Community

posted Dec 2, 2012

This algo should, in theory, find stocks that tend to fluctuate with each other. Based on example from the sklearn website.

My program seems to hang sometimes for no reason - perhaps someone can help?

With a couple of stocks it seems to work fine:

2011-06-01 handle_data:45 INFO Start date: 2011-06-01 00:00:00+00:00  
2012-05-30 handle_data:61 INFO Finished recording data : 251 days  
2012-05-30 handle_data:65 INFO Have 7 complete histories  
2012-05-30 handle_data:81 INFO 3 groups found:  
2012-05-30 PRINT Cluster 1: 4707, 5061, 20486, 3149  
2012-05-30 PRINT Cluster 2: 24  
2012-05-30 PRINT Cluster 3: 18522, 5885

One problem is not being able to look up the name of SIDs. And being limited to 10 SIDs in total means that more general analysis can't be done.

Interesting all the same :)

Perhaps someone could check it with a bunch of unrelated stocks and a couple known to co-fluctuate?

8 responses

Grant Kiehne

Dec 2, 2012

Hello James,

I was able to add three more securities and did not have a problem with the backtest hanging:

    c.sids = []  
    c.sids.append(sid(24))  
    c.sids.append(sid(18522))  
    c.sids.append(sid(5061))  
    c.sids.append(sid(20486))  
    c.sids.append(sid(5885))  
    c.sids.append(sid(4707))  
    c.sids.append(sid(3149)) #any more than 7 seems to make it hang?  
    c.sids.append(sid(698))  
    c.sids.append(sid(5923))  
    c.sids.append(sid(7792))

Also, so that all references to sids are in initialize(context), I replaced line 52 of your code above with:

if timedelta(weeks=52) > (data[c.sids[0]].datetime - c.startDate):

However, the backtest hangs if I then try these securities instead:

    c.sids = [sid(700),sid(8229),sid(4283),sid(1267),sid(698),sid(3951),sid(5923),sid(3496),sid(7792),sid(7883)]

James Jack

Dec 2, 2012

Hi Grant,

Good idea with line 52. Unfortunately, the backtester now hangs whatever I do for this algo - even for the version that originally worked.

John Fawcett

Dec 3, 2012

@James,

First, thanks for a great share. This is a really interesting approach.

Sorry for the hiccough with your backtest. A few of us just ran through the logs, and I think we must be swallowing an exception from your algorithm. Would you mind clicking the feedback link while your backtest is hung? It captures a bit of information about the context that might help us a little bit.

I was able to reproduce the hang running a backtest from 7/30/2010 to 7/30/2012, so hopefully we can pinpoint the problem soon. We're on it!

thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

John Fawcett

Dec 3, 2012

@James,

When I encounter the hanging, our system logs a key error for line 96 of the algorithm. The error should be reported to you in the UI, and we need to figure out why it isn't, but in the meantime I thought this information might get you past the hangup.

thanks,
fawce

Disclaimer

James Jack

Dec 3, 2012

Hi fawce,

I've submitted the feedback for you anyway. I'll have a think with the algo and see if I can work out why its broken

cheers
James

Jonathan Kamens

Dec 3, 2012

Hi James,

I'm not certain, but I don't think the block of code at lines 92-96 of your algo is doing quite what you intend. I think if you replace those five lines with this:

                groups[labels[i]].append(int(s))

and delete the grp_idx variable initialization from a couple lines above there, it'll have the intended effect.

Going one step further, since the indices you're using into "groups" are sequential ascending integers, you should probably make it a list instead of a dictionary. You can do that by changing this:

            groups = dict()  
            for x in range(numGroups):  
                groups[x] = []

with this:

            groups = []  
            for x in range(numGroups):  
                groups.append([])

Hope this helps. We're still looking into why the error from the algo didn't bubble back to your browser.

Disclaimer

Jonathan Kamens

Dec 3, 2012

HI again, James,

We've identified the bug within our application which is preventing the exception in your code from being reported back to you in the browser as it should be. We've got a fix in hand and it'll be in the next release we push.

Regards,

Jonathan Kamens

Disclaimer

James Jack

Dec 3, 2012

Hi Jonathan,

Thanks for your updates. I had started to wonder if the code at the end was actually going into an infinite loop or something and indeed it isn't the best written code, especially for python. I'm a C programmer by heart, so not used to thinking on the higher-level yet ;)

This works fine:

from sklearn import cluster, covariance  
from datetime import timedelta  
import numpy as np

# based on the example at:  
# http://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html

# use in quick backtester with 12 months worth of data

def initialize(context):  
    c = context  
    c.started = False  
    c.stopped = False  
    c.startDate = None  
    c.sids = []  
    c.history = dict()      # place to store the data  
    c.incomplete = set()  
    c.days = 0  
    # some sids to look at  
    c.sids = [sid(4697),sid(18522),sid(5061),sid(20486),sid(5885),sid(4707),sid(3149),sid(35920),sid(5005),sid(13797)]  
    # create a list for each sid record  
    for s in c.sids:  
        c.history[s] = []

def handle_data(context, data):  
    c = context  
    # more init on first call  
    if c.started == False:  
        if c.sids[0] not in data:  
            log.error("no starting date")  
        else:  
            c.startDate = data[c.sids[0]].datetime  
            log.info("Start date: %s" % ( c.startDate ))  
            c.started = True  
    # normal case  
    if c.started == True:  
        if c.stopped == False:  
            # record everything for a period of 12 months  
            if timedelta(weeks=52) > (data[c.sids[0]].datetime - c.startDate):  
                c.days += 1  
                for s in c.sids:  
                    if s in data:  
                        # add the day's price range to the list for this sid  
                        c.history[s].append(data[s].close - data[s].open)  
                    else:  
                        log.error("%s sid data not found!" % (str(s)))  
            else:  
                log.info("Finished recording data : %s days" % (c.days))  
                c.stopped = True  
                numHistories = len(c.history)  
                log.info("Have %s complete histories" % (numHistories))

                # create a variation matrix  
                # each row represents the time-series of (close - open) prices  
                variation = np.array([ c.history[v] for v in c.history ]).astype(np.float)  
                # tell it we're looking for a graph structure  
                edge_model = covariance.GraphLassoCV()  
                X = variation.copy().T  
                X /= X.std(axis=0)  
                edge_model.fit(X)  
                # now process into clusters based on co-fluctuation  
                _, labels = cluster.affinity_propagation(edge_model.covariance_)  
                numGroups = labels.max() + 1

                log.info("%i groups found:" % (numGroups))  
                # create structure to store groups  
                groups = []  
                for x in range(numGroups):  
                    groups.append([])  
                # filter the sids into the groups  
                for i, grp_idx in enumerate(labels):  
                    groups[grp_idx].append( int(c.sids[i]) )  
                # display stock sids that co-fluctuate:  
                for g in range(numGroups):  
                    print 'Cluster %i: %s' % (g + 1, ", ".join([str(s) for s in groups[g]]))

Cheers
J

You've successfully submitted a support ticket.

Our support team will be in touch soon.