Parsing tables and XML with BeautifulSoup
Last Updated :
12 Jan, 2024
Scraping is a very essential skill that everybody should learn, It helps us to scrap data from a website or a file that can be used in another beautiful manner by the programmer. In this article, we will learn how to extract tables with beautiful soup and XML from a file. Here, we will scrap data using the Beautiful Soup Python Module.
Perquisites:
Modules Required
- bs4: Beautiful Soup is a Python library for pulling data out of HTML and XML files.
- lxml: It is a Python library that allows us to handle XML and HTML files.
- requests: It allows you to send HTTP/1.1 requests extremely easily.
pip install bs4
pip install lxml
pip install request
Extract Tables With BeautifulSoup in Python
Below are the steps in which we will see how to extract tables with beautiful soup in Python:
Step 1: Import the Library and Define Target URL
Firstly, we need to import modules and then assign the URL.
Python3
# import required modules
import bs4 as bs
import requests
# assign URL
URL = 'https://www.geeksforgeeks.org/python-list/'
Step 2: Create Object for Parsing
In this step, we are creating a BeautifulSoup Object for parsing and further executions of extracting the tables.
Python3
# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")
Step 3: Locating and Extracting Table Data
In this step, we are finding the table and its rows.
Python3
# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')
Step 4: Extracting Text from Table Cell
Now create a loop to find all the td tags in the table and then print all the table data tags.
Python3
# display tables
for i in rows:
table_data = i.find_all('td')
data = [j.text for j in table_data]
print(data)
Complete Code
Below is the complete implementation of the above steps. In this code, we're scraping a specific table (numpy-table
class) from a GeeksforGeeks page about Python lists. After locating the table rows, we iterate through each row to extract and print the cell data.
Python3
# import required modules
import bs4 as bs
import requests
# assign URL
URL = 'https://www.geeksforgeeks.org/python-list/'
# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")
# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')
# display tables
for i in rows:
table_data = i.find_all('td')
data = [j.text for j in table_data]
print(data)
Output:

Parsing and Extracting XML files With BeautifulSoup
Below are the steps by which we can parse the XML files using BeautifulSoup in Python:
Step 1: Creating XML File
test1.xml: Before moving on, You can create your own 'xml file' or you can just copy and paste below code, and name it as test1.xml file on your system.
<?xml version="1.0" ?>
<books>
<book>
<title>Introduction of Geeksforgeeks V1</title>
<author>Gfg</author>
<price>6.99</price>
</book>
<book>
<title>Introduction of Geeksforgeeks V2</title>
<author>Gfg</author>
<price>8.99</price>
</book>
<book>
<title>Introduction of Geeksforgeeks V2</title>
<author>Gfg</author>
<price>9.35</price>
</book>
</books>
Step 2: Creating a Python File
In this step, we will create a Python file and start writing our code. Now we will import modules.
Python3
# import required modules
from bs4 import BeautifulSoup
Step 3: Reading the XML Content
In this step, we will read the content of the XML.
Python3
# reading content
file = open("test1.xml", "r")
contents = file.read()
Step 4: Parse the Content of the XML
In this step, we will parse the content of the XML.
Python3
# parsing
soup = BeautifulSoup(contents, 'xml')
titles = soup.find_all('title')
Step 5: Display the Content
In this step, we will display the content of the XML file.
Python3
# parsing
soup = BeautifulSoup(contents, 'xml')
titles = soup.find_all('title')
Complete Code
Below is the implementation of above steps. In this code, we're reading an XML file named "test1.xml" and parsing its content using BeautifulSoup with the XML parser. We then extract all <title>
tags from the XML and print their text content.
Python3
# import required modules
from bs4 import BeautifulSoup
# reading content
file = open("test1.xml", "r")
contents = file.read()
# parsing
soup = BeautifulSoup(contents, 'xml')
titles = soup.find_all('title')
# display content
for data in titles:
print(data.get_text())
Output:

Similar Reads
Get all HTML tags with BeautifulSoup Web scraping is a process of using bots like software called web scrapers in extracting information from HTML or XML content. Beautiful Soup is one such library used for scraping data through python. Beautiful Soup parses through the HTML content of the web page and collects it to provide iteration,
2 min read
Navigation with BeautifulSoup BeautifulSoup is a Python package used for parsing HTML and XML documents, it creates a parse tree for parsed paged which can be used for web scraping, it pulls data from HTML and XML files and works with your favorite parser to provide the idiomatic way of navigating, searching, and modifying the p
6 min read
How to Scrape Websites with Beautifulsoup and Python ? Have you ever wondered how much data is created on the internet every day, and what if you want to work with those data? Unfortunately, this data is not properly organized like some CSV or JSON file but fortunately, we can use web scraping to scrape the data from the internet and can use it accordin
10 min read
Converting HTML to Text with BeautifulSoup Many times while working with web automation we need to convert HTML code into Text. This can be done using the BeautifulSoup. This module provides get_text() function that takes HTML as input and returns text as output. Example 1: Python3 # importing the library from bs4 import BeautifulSoup # Init
1 min read
How to Install BeautifulSoup in Anaconda BeautifulSoup is a popular Python library used for web scraping purposes to pull the data out of HTML and XML files. If you're using the Anaconda distribution of Python, installing BeautifulSoup is straightforward. This article will guide you through the steps to install BeautifulSoup in Anaconda.Wh
3 min read
How to use Xpath with BeautifulSoup ? We have an HTML page and our task is to extract specific elements using XPath, which BeautifulSoup doesn't support directly. For example, if we want to extract the heading from the Wikipedia page on Nike, we canât do it with just BeautifulSoup, but with a mix of lxml and etree, we can. This article
2 min read