In this great demonstration example shared by Lucy Wu, is it not preferable to scrape using tickers from QTradableStocksUS()
? Over a given timeframe, am I correct in thinking that tickers from QTradableStocksUS()
will likely be more representative of live trading conditions, since survivorship bias is mitigated?
In order to thoroughly backtest a factor which is ranked using 10-k data, is it not vital to expose the model to companies that have gone out of business?
I am currently making a Scrapy crawler, which follows a similar process to that suggested by Lucy, but the program will currently scrape all 10-K documents (~11500 companies), which is obviously unnecessary. In the attached notebook, assuming the process is correct, it suggests ~4700 companies need to scraped, as opposed to ~6900 using the Nasdaq company data.
In order to make this possible, the issue I'm facing is mapping a ticker name to it's corresponding central index key (CIK), since Quantopian doesn't support the requests
library. Is it possible/permissible to export the ticker symbols from the attached notebook locally to allow me to only scrape QTradableStocksUS()
companies please?
Thanks in advance,
Joseph.