Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
html scrape with pandas

Hi there Quantopian,

I'm a newbie here and I've been wondering if you have any plans of including the .read_html() offered by pandas (based on html5lib). Why am I asking for this? My strategy involves getting a smaller universe using fundamentals, let's call it SUF, then using that universe scrape some outside source, write some python to filter again the SUF and then trade. The key here is that I either need to scrape some pages or somehow export (in whatever manner) the tickers for the SUF to somewhere (local drive,cloud,whatever) and do the scraping locally and use fetch_csv. And yes, I've thought of manually copying the tickers from the SUF and ... but that sort of convoluted and far from being automatic.

Cheers,
Victor

3 responses

You could run code on a server outside Quantopian, using eg. BeautifulSoup, to parse the HTML and make and publish CSV files, then import the CSV via Fetcher. BeautifulSoup is not on the list of packages Quantopian allows you to import.

Thank you for the quick reply Andre, but that is precisely what I am trying to avoid. Maybe I badly explained my predicament. Let's say my system is divided in 4 steps:

  1. Build smaller universe by using fundamentals
  2. Use the tickers in found in step 1 to parse a publicly available data source and use some logic to order the tickers.
  3. Use the ticker order/rank from step 2 to build even smaller universe
    1. Trade logic for the universe in 3.

So as you can see I have a problem with step 2. Any suggestions on how to circumvent this?

Cheers!

Have your algorithm tell your HTML2CSV server which tickers you want.

Currently, there is no way for Q algorithms to send anything. But they can make requests.

Have your server respond to Fetcher requests by parsing the HTML data for the given ticker symbols, storing it for later, and returning the corresponding CSV. Do you think this would work?