Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
where to get free or cheap survivor bias free data?

Hi guys,

I've been poking around here and it appears that i cannot implement my strategies on Q. (reasons are is that i need all historical data on 5k+ stocks at a time, and i need much more than 1 minute processing time)

I have a system that has returned excellent results with slippage and fees. One of the problems i'm facing is that i am currently backtesting on survivor data downloaded from yahoo finance. results from my simulator is good enough that i would like to re test it on higher quality data. any help greatly appreciated!

Quantopian staff: I have a working strategy. it is amazing. I wonder if you could help me figure out a way to test my algorithm. thank you.

4 responses

Hi Toan,

How long of a history window do you need on those 5k+ stocks? As well, what type of calculations are you looking to make that exceed the 1 minute limit? Have you tried using pipeline to get historical data on all stocks? It works daily, before market open, and allows you to pick a selection of stocks to trade each day.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Jamie, thanks for the reply. for any given stock, I need from 100-1000 end of day data points; open, high, low, close, adj close, and volume. my model is not currently written for intraday (though i suspect it could be adapted). Although I have not actually tried loading any of my code, I've written my own back tester and it takes about 2-10 minutes to analyse each end-of-day sample, depending on how wide (how many stocks) the data sample is. Since i'm using free data, looking back to 1990, I only have a 500 or so stocks to look at such as ibm, ko, aapl and that's where to 2 minute run times are. looking at non-survivor biased data, i think it will take longer than 10, unless you guys have implement some very clever code parallelizer that doesn't look at the code. I am not sure, but i think pipeline is the opposite of my approach. I think it allows you to whittle down the entire data set to a few using set params and filters?

I have not proceeded with Quantopian platform because i absolutely need 5k stocks data, and I need numpy and pandas.

Hi Toan,

I'm not quite sure exactly what you're asking for, but it sounds very similar to Pipeline. Pipeline allows you to select stocks and compute factors using OHLCV data from a trailing window on all 8000+ US equities. The idea behind pipeline is that you can use these factors to narrow down the list of securities to those that you are interested in trading. The results of any pipeline is a pandas DataFrame, and numpy and pandas are available in the platform, so I'd recommend giving it a shot!

Jamie, sorry if i'm not clear.

Let's say hypothetically, i have a data preprocessor that takes in 8k stocks and applies various mappings (multiplication, subtraction, normalization, svd/pca), a neural net that takes in this data and spits out another data set, and a svm that makes predictions, would i be able to push this to pipeline?

OHLCV -> preprocessor -> NN -> svm -> prediction