Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Artificial Neural Networks for Prediction

Hello Everyone,I'm a bioinformatician with a history in analyzing large genetic datasets. I'm primarily interested in applying biologically inspired algorithms that work extremely well with genetic data to analyze market data. Being a nascent Quantopian member I am a little confused as to the implementation of ANN via PCA. One of the first steps that I need to accomplish is to transform historical stock data for all companies listed for let's say the last 5 years into "independent/non-correlated" principal components to then feed into the ANN for feature selection, optimization and eventually prediction. I consider this a vital step because I'm not interested in any one company, just a cluster of companies whose intra-relationship serves to be a statistically significant predictor of the future price of another companies stock. Perhaps, I'm not thinking about this in the right way, but how would I run the PCA on all historical stock data if I am not allowed access to the raw values of all the stock prices. Thank you in advance for your time and help.
-Gaurav

4 responses

Hello,

Most of us probably know what ANNs are but I didn't know what PCA is - I now know it's Principal Component Analysis but that's all I know!

There are probably several ways to approach the data access. This example gives you approximately 5 years of data for two equities which then rolls forward daily:

def initialize(context):  
    context.sids = [sid(2), sid(24)]

# Will be called on every trade event for the securities you specify.  
def handle_data(context, data):  
    prices = get_prices(data, context)  
    if prices is None:  
        return  
    print prices.head()  
    print prices.tail()  
@batch_transform(window_length=1260, refresh_period=1)  
def get_prices(datapanel, context):  
    prices = datapanel['price']  
    return prices  

The output begins:

2007-01-05PRINT 24 2  
2002-01-04 00:00:00+00:00 11.845 37.31  
2002-01-07 00:00:00+00:00 11.455 38.02  
2002-01-08 00:00:00+00:00 11.320 37.34  
2002-01-09 00:00:00+00:00 10.830 36.62  
2002-01-10 00:00:00+00:00 10.615 35.93

2007-01-05PRINT 24 2  
2006-12-28 00:00:00+00:00 80.876 29.99  
2006-12-29 00:00:00+00:00 84.780 29.96  
2007-01-03 00:00:00+00:00 83.800 29.31  
2007-01-04 00:00:00+00:00 85.640 29.11  
2007-01-05 00:00:00+00:00 85.080 28.75


2007-01-08PRINT 24 2  
2002-01-07 00:00:00+00:00 11.455 38.02  
2002-01-08 00:00:00+00:00 11.320 37.34  
2002-01-09 00:00:00+00:00 10.830 36.62  
2002-01-10 00:00:00+00:00 10.615 35.93  
2002-01-11 00:00:00+00:00 10.530 35.65

2007-01-08PRINT 24 2  
2006-12-29 00:00:00+00:00 84.780 29.96  
2007-01-03 00:00:00+00:00 83.800 29.31  
2007-01-04 00:00:00+00:00 85.640 29.11  
2007-01-05 00:00:00+00:00 85.080 28.75  
2007-01-08 00:00:00+00:00 85.459 28.51  

An issue that may be a problem for you is that you cannot persist data between runs of algorithms - everything is generated each time. You can import data from CSV using 'fetch' or you could use zipline which is the Open Source backtester that Quantopian is based on. With zipline you do not have access to the Quantopian data.

P.

Peter, Thank you for your help. It is helpful, but I think your suggestion at the end about not being able to persist data between runs of algorithms is partially what I was trying to articulate earlier. I really like this project, I'm going to try to work around it by possibly splitting the analysis and importing csvs into Quantopian.
-G

Hello Gaurav,

You might have a look at:

https://www.quantopian.com/posts/1st-attempt-finding-co-fluctuating-stocks

https://www.quantopian.com/posts/batch-transform-version-of-scikits-learn-example-finding-co-fluctuating-stocks

https://www.quantopian.com/posts/quick-bug-fix-of-batch-transform-version-of-scikits-learn-example-finding-co-fluctuating-stocks

There may be others. Note that I found these using google (the built-in Quantopian search appears to be limited).

Generally, you can analyze up to 100 individually specified securities. Alternatively, there is the set_universe functionality which you can read about on the help page. With it, you can analyze more, but the universe is not fixed versus time, since it is filtered by dollar volume.

Grant

Hi Gaurav,

Very interesting project, I've been working on something very similar. If you're not familiar with Python, you might want to have a look at the PCA function in matplotlib and the PCA capabilities in the scikit-learn package.

How do you intend to implement the PCA for feeding the ANN? For example, would you do a PCA on the time series returns or a PCA on the moving average of the returns etc. I've tried a few different things but haven't had much luck with using PCA and ANN's.

Aidan