Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Scrape the web for new datasets with kimono

Hi, I’m Pratap - co-founder of kimono. We specialize in making web scraping painless. We’d love to hear how you use (or want to use) data scraped from the web and what datasets would be most useful that you don't have access to today.
Also, depending on your interest, we might work with our friends and Quantopian to make scraped data more easily accessible for you. Shoot me an email at pratap at kimonolabs dot com and I can share more about what kimono can do and data sets you can get access to.

Thanks!

Pratap

19 responses

Great job Pratap. Will email you

Does it work on ajax based content, which is usually hard to scrape?

Hi Bharath, yes we work on AJAX pages. There are a few AJAX-heavy pages that can be a bit tricky but we're releasing new features that will expand our coverage even further soon. What pages are you looking to get data from for example? Thanks

http://www.nseindia.com/products/content/derivatives/equities/historical_fo.htm

This is the url I'm trying to get data from. But it involves filling a form and downloading data.

Thanks for sharing - this page requires form submission via POST. We don't support it just yet, but it's on our roadmap and we'll be supporting it soon. Will send out an email update and tweet it out (@kimonolabs) when that's ready. Other sites / datasets that would be useful?

Dividends and splits from nasdaq.com !

Hey Simon - great feedback, thanks! I made a few APIs quickly with kimono to get dvidends and stock splits from the NASDAQ. You can check out the Dividends API here (sign up for a free kimono account to clone and modify it): Here's the JSON data: https://www.kimonolabs.com/api/ondemand/dxm4dago?apikey=883b0a03282352f8bfeda5e906bcfd9c (json); And as a CSV https://www.kimonolabs.com/api/csv/ondemand/dxm4dago?apikey=883b0a03282352f8bfeda5e906bcfd9c ; It's set for AAPL right now, but you can pass in any ticker and get data for that stock (or you can set up a crawl to get for a set of stocks you're interested in). For example, for MSFT just add &kimpath2=msft, like this: https://www.kimonolabs.com/api/ondemand/dxm4dago?apikey=883b0a03282352f8bfeda5e906bcfd9c&kimpath2=msft. Also, here's a stock split API: https://www.kimonolabs.com/apis/4au5zs4e; CSV data is here: https://www.kimonolabs.com/api/csv/4au5zs4e?apikey=883b0a03282352f8bfeda5e906bcfd9c

Cool thanks!

@ Pratap

Is it possible to get anything out of this one?

http://www.nseindia.com/content/fo/fo_contractsdata.htm

Don't bother trying if this is not a big use case for the product.

Hey Bharath, do you have an example query I can test on this page?

Hey Bharath, i can't access any data from this or any other queries even via the browser - just returns a Not Found. Do you require an account or a local IP to access data from NSEIndia? Or is there another query that works? Thanks

Will check if there is an alternative source

Also, we just published a post you may find interesting: http://blog.kimonolabs.com/2015/04/17/monkeylearn-news-analysis/ - it shows you how to easily crawl news data and extract a signal by analyzing topics & entities. Let me know if this type of data is valuable

Hi Pratap,

Big project you've undertaken, scraping data. I'm looking for insider buying, names, dates, amounts. Best page I've seen for this is finviz.com

Hey Easan, you can create an API to pipe out data from Finviz on a schedule. For example, I made one here: https://www.kimonolabs.com/apis/5vpgzur6 (you can clone it and edit it after signing up for kimono - it's free). You can also check out the CSV data directly without signing up here: https://www.kimonolabs.com/api/csv/5vpgzur6?apikey=ewRZOgAjSfl7FrL6CennivIEp94nsp4Y. Let me know if there are other data sets you'd find useful, or if I can help tweak this one for you.

Thanks, Pratap, for the lightning-fast and complete response. I look forward to working with you.

Would love data scraping to help back test scripts on international markets (such as Australia)

www.netqoute.com.au comes to mind or gurufocus?

Hi Ciaran, you will be able to login to netquote and get this data, using kimono's Auth capability. Here's a tutorial on setting up an Auth API that logs in for you on a schedule and gets the data you need. Let me know if this works. If you're running int trouble let me know - I'd personally love to help you get set up and running.