Data Scientist at Toptal and Exegetic Analytics.
Accepted Talks:
Often the data you want is available somewhere on the internet. It might all be on one page (if you're lucky!) or distributed across many pages (possibly hundreds or thousands of pages!). But you want those data consolidated locally. Not on a server in some distant land, but right here on your hardware. And in a convenient format. CSV or JSON, perhaps? Certainly not HTML! What would Ragnar do? He'd go out, grab those data and bring them home. The contemporary Internet Viking uses Web Scraping techniques to systematically extract information from web pages. This tutorial will demonstrate the process of web scraping. This is the battle plan: The first two components will be fairly brief, covering this material at a high level. We'll dig much deeper into the latter topics. By the end of the tutorial you should be able to easily (and confidently) pillage and plunder large swathes of the internet. Come along and make Ragnar proud. Tyr! Odin owns you all! This tutorial will be suitable for Vikings with low to moderate levels of Python experience. For this workshop I'll be using Python 3 with the following packages:
Web Scraping: Unleash your Internet Viking
Description
Software Requirements
beautifulsoup4==4.5.3
lxml==3.8.0
pandas==0.20.2
Pillow==4.0.0
PySocks==1.6.7
requests==2.18.1
Scrapy==1.4.0
selenium==3.5.0
I'll also be making extensive use of Jupyter Notebooks.
Please make sure that you have all of the above installed before you arrive.