SlideShare a Scribd company logo
Web Scraping with Google Gemini 2.0
By - Tamanna
NextGen_Outlier 1
Overview
What is Web Scraping?
Why Google Gemini 2.0?
Setup
Workflow
Example: E-Commerce Scraping
Example: Airbnb Reviews
Advanced Features
Use Cases
Limitations
Conclusion
NextGen_Outlier 2
What is Web Scraping?
Automatically extracting data from websites
Converts unstructured web content into structured formats (e.g., JSON, CSV)
Examples:
Product prices from e-commerce sites
Customer reviews from Airbnb
Traditionally requires coding (e.g., BeautifulSoup, Scrapy)
NextGen_Outlier 3
Why Use Google Gemini 2.0?
No Coding Required: Use natural language or voice commands
Dynamic Websites: Handles JavaScript-heavy pages
Structured Output: Returns data in JSON or CSV
Cost-Effective: ~$0.075 per million tokens (Gemini 1.5 Flash)
Beginner-Friendly: Ideal for marketers, researchers, non-coders
NextGen_Outlier 4
Setting Up Gemini 2.0
1. Access Google AI Studio: https://aistudio.google.com/apikey
2. Generate and secure your API key
3. Enable screen sharing for voice-based scraping (optional)
4. Install Python libraries:
import os
os.system("pip install requests beautifulsoup4 markdownify google-generativeai")
NextGen_Outlier 5
Web Scraping Workflow
Step Tools Used
Fetch Webpage requests
Parse HTML BeautifulSoup
Clean Content markdownify
AI Extraction google-generativeai
Save Output json , pandas
NextGen_Outlier 6
Example: E-Commerce Scraping
Goal: Extract product details (name, price, etc.) from a webpage
import requests, google.generativeai as genai
from bs4 import BeautifulSoup
from markdownify import markdownify
genai.configure(api_key="YOUR_GEMINI_API_KEY")
url = "https://www.scrapingcourse.com/ecommerce/..."
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
main_html = str(soup.select_one("#main"))
main_markdown = markdownify(main_html)
NextGen_Outlier 7
E-Commerce Scraping: Prompt and Output
Prompt:
Extract data in JSON format: sku, name, price, description, sizes, colors
CONTENT: {main_markdown}
Output (product.json):
{
"sku": "ATJ-001",
"name": "Adrienne Trek Jacket",
"price": "$89.99",
"description": "Lightweight, water-resistant jacket...",
"sizes": ["S", "M", "L", "XL"],
"colors": ["Black", "Blue", "Green"]
}
NextGen_Outlier 8
Example: Airbnb Reviews
Open Airbnb listing, enable screen sharing in Google AI Studio
Voice command: "Extract all reviews visible on the screen in JSON"
Scroll to load more reviews
Sample Output:
{
"reviews": [
{"name": "Maria", "date": "March 2023", "rating": "5 stars", ...},
{"name": "John", "date": "April 2023", "rating": "4 stars", ...}
]
}
NextGen_Outlier 9
Advanced Tips
CAPTCHAs: Use Crawlbase Smart Proxy
Rate Limits: Add delays (e.g., time.sleep(1) )
Dynamic Content: Use Puppeteer or Playwright
Cost Optimization: Convert HTML to Markdown
NextGen_Outlier 10
Common Use Cases
Use Case Example
E-Commerce Scrape Amazon prices
Market Research Gather Airbnb reviews
Real Estate Extract Zillow listings
News Aggregation Scrape news headlines
NextGen_Outlier 11
Limitations to Consider
Speed: Slower than traditional parsers
Cost: Token usage adds up for large HTML
Accuracy: May misinterpret complex layouts
Legal: Check robots.txt and terms of service
NextGen_Outlier 12
Conclusion
Google Gemini 2.0 simplifies web scraping for all users
No coding needed with natural language or voice commands
Ideal for e-commerce, research, and more
Start today at https://aistudio.google.com
NextGen_Outlier 13
Thank you!!
NextGen_Outlier 14

More Related Content

Similar to Web Scraping with Google Gemini 2.0 .pdf (20)

PPTX
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral
 
PPTX
Web Scraping
Sarvesh Singh
 
PPTX
Web scrapping and how to do it using python.pptx
bakada6025
 
PDF
Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...
ThinkODC
 
PDF
The ultimate guide to web scraping 2018
STELIANCREANGA
 
PPTX
DATA SCRAPING AND WEB Scrapping.....pptx
ssusereff6ca
 
PPTX
Web scrapping.pptx
MakhanChor2
 
PPTX
Web scraping & browser automation
BHAWESH RAJPAL
 
PPTX
Web-Scraping-ppt-datascience-scraping data from websites.pptx
adsorwadoa
 
PPTX
633943418- introduction to Web-Scraping-ppt.pptx
ssghanvat2005
 
PDF
AI와 같이 살기 - 남서울대학교 인터브이알
HashScraper Inc.
 
PPTX
Web scraping with BeautifulSoup, LXML, RegEx and Scrapy
LITTINRAJAN
 
PDF
Web scraping in python
Viren Rajput
 
PDF
Web Scraping API - API For Web Scraping.pdf
Data Scraping and Data Extraction
 
PDF
Advanced Web Scraping or How To Make Internet Your Database #seoplus2018
Esteve Castells
 
PDF
How To Crawl Amazon Website Using Python Scrapy.pdf
jimmylofy
 
PPTX
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
DataScienceConferenc1
 
PPTX
Using Web Data for Finance
Scrapinghub
 
PDF
Scrapy talk at DataPhilly
obdit
 
PPTX
How To Crawl Amazon Website Using Python Scrap (1).pptx
iwebdatascraping
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral
 
Web Scraping
Sarvesh Singh
 
Web scrapping and how to do it using python.pptx
bakada6025
 
Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...
ThinkODC
 
The ultimate guide to web scraping 2018
STELIANCREANGA
 
DATA SCRAPING AND WEB Scrapping.....pptx
ssusereff6ca
 
Web scrapping.pptx
MakhanChor2
 
Web scraping & browser automation
BHAWESH RAJPAL
 
Web-Scraping-ppt-datascience-scraping data from websites.pptx
adsorwadoa
 
633943418- introduction to Web-Scraping-ppt.pptx
ssghanvat2005
 
AI와 같이 살기 - 남서울대학교 인터브이알
HashScraper Inc.
 
Web scraping with BeautifulSoup, LXML, RegEx and Scrapy
LITTINRAJAN
 
Web scraping in python
Viren Rajput
 
Web Scraping API - API For Web Scraping.pdf
Data Scraping and Data Extraction
 
Advanced Web Scraping or How To Make Internet Your Database #seoplus2018
Esteve Castells
 
How To Crawl Amazon Website Using Python Scrapy.pdf
jimmylofy
 
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
DataScienceConferenc1
 
Using Web Data for Finance
Scrapinghub
 
Scrapy talk at DataPhilly
obdit
 
How To Crawl Amazon Website Using Python Scrap (1).pptx
iwebdatascraping
 

More from Tamanna (14)

PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna
 
PPTX
Building Powerful Agentic AI with Google ADK, MCP, RAG, and Ollama.pptx
Tamanna
 
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna
 
PPTX
Understanding Large Language Model Hallucinations: Exploring Causes, Detectio...
Tamanna
 
PPTX
Understanding LLM Temperature: A comprehensive Guide
Tamanna
 
PDF
Knowledge based System
Tamanna
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna
 
Building Powerful Agentic AI with Google ADK, MCP, RAG, and Ollama.pptx
Tamanna
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna
 
Understanding Large Language Model Hallucinations: Exploring Causes, Detectio...
Tamanna
 
Understanding LLM Temperature: A comprehensive Guide
Tamanna
 
Knowledge based System
Tamanna
 
Ad

Recently uploaded (20)

PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PPT
Performance Review for Security and Commodity.ppt
chatwithnitin
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PPTX
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Data base management system Transactions.ppt
gandhamcharan2006
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
Performance Review for Security and Commodity.ppt
chatwithnitin
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
deep dive data management sharepoint apps.ppt
novaprofk
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Ad

Web Scraping with Google Gemini 2.0 .pdf

  • 1. Web Scraping with Google Gemini 2.0 By - Tamanna NextGen_Outlier 1
  • 2. Overview What is Web Scraping? Why Google Gemini 2.0? Setup Workflow Example: E-Commerce Scraping Example: Airbnb Reviews Advanced Features Use Cases Limitations Conclusion NextGen_Outlier 2
  • 3. What is Web Scraping? Automatically extracting data from websites Converts unstructured web content into structured formats (e.g., JSON, CSV) Examples: Product prices from e-commerce sites Customer reviews from Airbnb Traditionally requires coding (e.g., BeautifulSoup, Scrapy) NextGen_Outlier 3
  • 4. Why Use Google Gemini 2.0? No Coding Required: Use natural language or voice commands Dynamic Websites: Handles JavaScript-heavy pages Structured Output: Returns data in JSON or CSV Cost-Effective: ~$0.075 per million tokens (Gemini 1.5 Flash) Beginner-Friendly: Ideal for marketers, researchers, non-coders NextGen_Outlier 4
  • 5. Setting Up Gemini 2.0 1. Access Google AI Studio: https://aistudio.google.com/apikey 2. Generate and secure your API key 3. Enable screen sharing for voice-based scraping (optional) 4. Install Python libraries: import os os.system("pip install requests beautifulsoup4 markdownify google-generativeai") NextGen_Outlier 5
  • 6. Web Scraping Workflow Step Tools Used Fetch Webpage requests Parse HTML BeautifulSoup Clean Content markdownify AI Extraction google-generativeai Save Output json , pandas NextGen_Outlier 6
  • 7. Example: E-Commerce Scraping Goal: Extract product details (name, price, etc.) from a webpage import requests, google.generativeai as genai from bs4 import BeautifulSoup from markdownify import markdownify genai.configure(api_key="YOUR_GEMINI_API_KEY") url = "https://www.scrapingcourse.com/ecommerce/..." response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") main_html = str(soup.select_one("#main")) main_markdown = markdownify(main_html) NextGen_Outlier 7
  • 8. E-Commerce Scraping: Prompt and Output Prompt: Extract data in JSON format: sku, name, price, description, sizes, colors CONTENT: {main_markdown} Output (product.json): { "sku": "ATJ-001", "name": "Adrienne Trek Jacket", "price": "$89.99", "description": "Lightweight, water-resistant jacket...", "sizes": ["S", "M", "L", "XL"], "colors": ["Black", "Blue", "Green"] } NextGen_Outlier 8
  • 9. Example: Airbnb Reviews Open Airbnb listing, enable screen sharing in Google AI Studio Voice command: "Extract all reviews visible on the screen in JSON" Scroll to load more reviews Sample Output: { "reviews": [ {"name": "Maria", "date": "March 2023", "rating": "5 stars", ...}, {"name": "John", "date": "April 2023", "rating": "4 stars", ...} ] } NextGen_Outlier 9
  • 10. Advanced Tips CAPTCHAs: Use Crawlbase Smart Proxy Rate Limits: Add delays (e.g., time.sleep(1) ) Dynamic Content: Use Puppeteer or Playwright Cost Optimization: Convert HTML to Markdown NextGen_Outlier 10
  • 11. Common Use Cases Use Case Example E-Commerce Scrape Amazon prices Market Research Gather Airbnb reviews Real Estate Extract Zillow listings News Aggregation Scrape news headlines NextGen_Outlier 11
  • 12. Limitations to Consider Speed: Slower than traditional parsers Cost: Token usage adds up for large HTML Accuracy: May misinterpret complex layouts Legal: Check robots.txt and terms of service NextGen_Outlier 12
  • 13. Conclusion Google Gemini 2.0 simplifies web scraping for all users No coding needed with natural language or voice commands Ideal for e-commerce, research, and more Start today at https://aistudio.google.com NextGen_Outlier 13