Web Scrape to “Make UI
Great Again”
6 December 2017Gavin Wiener
Goal
Using web-scraping, you seldom have to be forced into using
an unfriendly user interface
What is Web-Scraping
“Web scraping, web harvesting, or web data extraction is data
scraping used for extracting data from websites”
https://en.wikipedia.org/wiki/Web_scraping
Tools
● Python + BeautifulSoup (min. previous knowledge required)
○ https://www.python.org/
○ https://www.crummy.com/software/BeautifulSoup/bs4/doc/
● Mobile-Friendly CSS - Spectre.css
○ https://picturepan2.github.io/spectre/
● 1 or more badly designed websites
○ https://myciti.org.za/en/home/
○ https://myciti.org.za/en/timetables/route-stop-timetables/
● (Optional) Hosting
○ I essentially created a website
MyCiti: Initial
Simulator: iPhone 5
MyCiti: Goal
Simulator: iPhone 5
Identify Structure: Inspecting Elements
Get Data: Investigate URLs
Encoded
https://myciti.org.za/en/timetables/route-stop-timetables/?timetable%5Bweekday%5D=sunday&timetable
%5Bstation%5D=493&timetable%5Broute%5D=&timetable%5Bdirection%5D=
Decoded
https://myciti.org.za/en/timetables/route-stop-timetables/?timetable[weekday]=su
nday&timetable[station]=493&timetable[route]=&timetable[direction]=
Tool: https://meyerweb.com/eric/tools/dencoder/
I Can Haz Your Data - Code
Getting the timetable of a stop
I Can Haz Your Data - Code
Getting the timetable of a stop
I Can Haz Your Data - Raw
Getting the timetable of a stop
Create a New Interface
Summary
1. Find a website e.g. MyCiti
2. Identify the structure, and interesting components e.g. <table>
3. Identify how to reach the data e.g. urls
4. ‘Scrape’ the data with code e.g. code
5. Create your new interface
And You Have a Website
gavinwiener@gmail.com
http://github.com/divisionMax/

Web Scrape to "Make UI Great Again"

  • 1.
    Web Scrape to“Make UI Great Again” 6 December 2017Gavin Wiener
  • 2.
    Goal Using web-scraping, youseldom have to be forced into using an unfriendly user interface
  • 3.
    What is Web-Scraping “Webscraping, web harvesting, or web data extraction is data scraping used for extracting data from websites” https://en.wikipedia.org/wiki/Web_scraping
  • 4.
    Tools ● Python +BeautifulSoup (min. previous knowledge required) ○ https://www.python.org/ ○ https://www.crummy.com/software/BeautifulSoup/bs4/doc/ ● Mobile-Friendly CSS - Spectre.css ○ https://picturepan2.github.io/spectre/ ● 1 or more badly designed websites ○ https://myciti.org.za/en/home/ ○ https://myciti.org.za/en/timetables/route-stop-timetables/ ● (Optional) Hosting ○ I essentially created a website
  • 5.
  • 6.
  • 7.
  • 8.
    Get Data: InvestigateURLs Encoded https://myciti.org.za/en/timetables/route-stop-timetables/?timetable%5Bweekday%5D=sunday&timetable %5Bstation%5D=493&timetable%5Broute%5D=&timetable%5Bdirection%5D= Decoded https://myciti.org.za/en/timetables/route-stop-timetables/?timetable[weekday]=su nday&timetable[station]=493&timetable[route]=&timetable[direction]= Tool: https://meyerweb.com/eric/tools/dencoder/
  • 9.
    I Can HazYour Data - Code Getting the timetable of a stop
  • 10.
    I Can HazYour Data - Code Getting the timetable of a stop
  • 11.
    I Can HazYour Data - Raw Getting the timetable of a stop
  • 12.
    Create a NewInterface
  • 13.
    Summary 1. Find awebsite e.g. MyCiti 2. Identify the structure, and interesting components e.g. <table> 3. Identify how to reach the data e.g. urls 4. ‘Scrape’ the data with code e.g. code 5. Create your new interface
  • 14.
    And You Havea Website [email protected] http://github.com/divisionMax/