Selenium WebDriver allows users to automate web browsers for testing and data extraction tasks. When working with Selenium, obtaining the HTML source of a specific web element is often necessary rather than the entire page.
Overview
What is the HTML Source of a Web Element?
It is the exact HTML markup defines a particular element on a web page, including its tags, attributes, and nested content.
How to Get HTML Source of a Web Element in Selenium WebDriver Using Python?
- Locate the target web element using Selenium’s element-finding methods.
- Retrieve the element’s outer HTML, which includes the element’s tag and all its inner content.
- Optionally, get the inner HTML to extract only the content inside the element’s tags.
- Use the extracted HTML for validation, debugging, or feeding into other test steps.
Understanding how to extract an element’s HTML is useful for debugging, verifying content, or further automation steps. Read this article to learn how to get HTML source of a web element in Selenium WebDriver.
What is HTML Source?
This refers to the HTML code underlying a certain web element on a web page. Since it is the foundation of any web page, testing HTML code in a normal browser and cross-browser testing scenarios becomes vital. Although, do not confuse this with the HTML <source> tag.
What is a Web Element?
Anything that appears on a web page is a web element. Most obviously, this refers to text boxes, checkboxes, buttons, or any other fields that display or require data from the user. Web elements can also mean the tags within the web page’s HTML code. Essentially, interaction with the HTML code is interaction with a web element. Such elements usually have unique identifiers, such as ID, name, or unique classes.
For example, to highlight text on a page, one would have to interact with the “body”, a “div” and perhaps even a “p” element.
It is common for web elements to occur within other web elements. One can use mechanisms such as XPath in Selenium or CSS Selectors to locate them. You find element by XPath in Selenium.
Read More: Quick XPath Locators Cheat Sheet
How to get HTML source of a web element using Python?
To start with, download the Python bindings for Selenium WebDriver.
- One can do this from the PyPI page for the Selenium package.
- Alternatively, one can use pip to install the Selenium package. Python 3.6 provides the pip in the standard library. Install Selenium with pip with the following syntax:
pip install selenium
It is also possible to use virtualenv to create isolated Python environments. Python 3.6 offers pyvenv which is quite similar to virtualenv.
Notes for Windows users
- Install Python 3.6 with the MSI provided in the python.org download page.
- Start a command prompt using the cmd.exe program. Then run the pip command with the syntax given below to install Selenium.
C:Python35Scriptspip.exe install selenium
Now, here’s how to get a web element:
elem = wd.find_element_by_css_selector('#my-id')
Here’s how to get the HTML source for the full page:
wd.page_source
Learn More: Selenium Wait Commands using Python
How to Get HTML Page Source in Selenium WebDriver Using Python?
To get the HTML page source in Selenium WebDriver using Python, there are several methods available such as:
- driver.page_source,
- driver.execute_script,
- and XPath querying.
Each method offers a distinct approach for retrieving the page source, depending on the test requirements or the element being accessed.
The below sections discuss these in detail, with examples from the bstackdemo.com site.
Get HTML Page Source using driver.page_source
The driver.page_source attribute allows retrieval of the entire HTML source of the current page as a string. This method is ideal when the entire page source is needed, regardless of specific elements.
Syntax:
page_source = driver.page_source
This example retrieves the entire HTML content of bstackdemo.com using the driver.page_source attribute.
Example Code:
from selenium import webdriver # Initialize WebDriver driver = webdriver.Chrome() # Open bstackdemo.com driver.get("https://bstackdemo.com/") # Get the page source page_source = driver.page_source # Output the page source (truncated for brevity) print(page_source[:500]) # Print first 500 characters of the source # Close the browser driver.quit()
Output:
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head><meta charset="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"/><meta name="description" content="This is a sample app to showcase BrowserStack Automate"/><meta name="author" content="BrowserStack"/><title>BrowserStack Demo App</title><link rel="shortcut icon" href="favicon.ico"/><link href="css/bootstrap.min.css" rel="stylesheet"/><link href="css/fontawesome.min.css" rel="stylesheet"/><link href="cs
Get HTML Page Source using driver.execute_script
The driver.execute_script method allows executing JavaScript on the page. It can retrieve the entire HTML source by executing JavaScript code that returns the HTML content of the page.
Syntax:
page_source = driver.execute_script("return document.documentElement.outerHTML;")
This method is beneficial when the HTML source is required after JavaScript execution or dynamic content loading. Here, the JavaScript code document.documentElement.outerHTML retrieves the full HTML of the page.
Example Code:
from selenium import webdriver # Initialize WebDriver driver = webdriver.Chrome() # Open bstackdemo.com driver.get("https://bstackdemo.com/") # Execute JavaScript to get page source page_source = driver.execute_script("return document.documentElement.outerHTML;") # Output the page source (truncated for brevity) print(page_source[:500]) # Print first 500 characters of the source # Close the browser driver.quit()
Output:
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head><meta charset="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"/><meta name="description" content="This is a sample app to showcase BrowserStack Automate"/><meta name="author" content="BrowserStack"/><title>BrowserStack Demo App</title><link rel="shortcut icon" href="favicon.ico"/><link href="css/bootstrap.min.css" rel="stylesheet"/><link href="css/fontawesome.min.css" rel="stylesheet"/><link href="cs
Get HTML Page Source Using XPath
XPath can select specific elements on the page and retrieve their HTML content. This method is helpful when only a specific section of the page, such as a particular div or element, needs to be captured.
Syntax:
element_html = driver.find_element_by_xpath("your_xpath_expression").get_attribute("outerHTML")
Scenario:
In this example, an XPath expression is used to retrieve the HTML of a specific element on the page (div.container). This is useful when you want to extract HTML for a specific element without retrieving the entire page.
Example Code:
from selenium import webdriver # Initialize WebDriver driver = webdriver.Chrome() # Open bstackdemo.com driver.get("https://bstackdemo.com/") # Example XPath to get the HTML of a specific element element_html = driver.find_element_by_xpath("//div[@class='container']").get_attribute("outerHTML") # Output the HTML of the element (truncated for brevity) print(element_html[:500]) # Print first 500 characters of the element's HTML # Close the browser driver.quit()
Output:
<div class="container"> <header class="site-header"> <nav class="navbar navbar-expand-lg navbar-dark bg-dark"> <a class="navbar-brand" href="/">BrowserStack Demo App</a> <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNav" aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="collapse navbar-collapse" id="navbarNav"> <ul class="navbar-nav ml-auto"> <li class="nav-item"> <a class="nav-link" href="/https/www.browserstack.com/home">Home</a>
How to retrieve the HTML source of a web element using Selenium?
There are two main methods for retrieving the HTML source of a specific web element in Selenium:
Using the innerHTML attribute and the outerHTML attribute.
These methods allow extracting the HTML content of elements, depending on whether the element’s content or the entire element (including the tag itself) is required.
Also Read: How to Test HTML Code in a Browser?
Method 1: Get HTML Source in Selenium with innerHTML attribute
The innerHTML attribute retrieves the HTML content inside the selected element, excluding the element’s tag itself. This method is practical when extracting the contents inside an element (e.g., the text, child elements, etc.), but not the element’s tag.
Syntax:
element_inner_html = driver.find_element_by_xpath("your_xpath_expression").get_attribute("innerHTML")
Scenario:
In this example, the innerHTML attribute is used to extract the HTML content inside a div element with the class name site-header from bstackdemo.com. This method will return the content inside the header, without the <div> tag itself.
Example Code:
from selenium import webdriver # Initialize WebDriver driver = webdriver.Chrome() # Open bstackdemo.com driver.get("https://bstackdemo.com/") # Get the inner HTML of the header section header_inner_html = driver.find_element_by_xpath("//div[@class='site-header']").get_attribute("innerHTML") # Output the inner HTML content (truncated for brevity) print(header_inner_html[:500]) # Print first 500 characters of the content inside the header # Close the browser driver.quit()
Output:
<nav class="navbar navbar-expand-lg navbar-dark bg-dark"> <a class="navbar-brand" href="/">BrowserStack Demo App</a> <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNav" aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="collapse navbar-collapse" id="navbarNav"> <ul class="navbar-nav ml-auto"> <li class="nav-item"> <a class="nav-link" href="/https/www.browserstack.com/home">Home</a>
Read the innerHTML attribute to get the source of the element’s content. innerHTML is a property of a DOM element whose value is the HTML between the opening tag and ending tag.
For example, the innerHTML property in the code below carries the value “text”
<p> a text </p>
This property can use to retrieve or dynamically insert content on a web page. However, if it is used to do anything beyond inserting simple text, some differences may occur in how it operates across different browsers. It is a good practice to test your website across browsers and devices, try now.
innerHTML was first implemented in Internet Explorer 5.
It has been part of the standard and has existed as a property of HTMLElement and HTMLDocument since HTML 5.
Implement the innerHTML attribute to get the HTML source in Selenium with the following syntax:
Python:
element.get_attribute('innerHTML') Java:
elem.getAttribute("innerHTML");
C#:
element.GetAttribute("innerHTML");
Ruby:
element.attribute("innerHTML")
JS:
element.getAttribute('innerHTML');
PHP:
$elem.getAttribute('innerHTML');
Also Read: How to test Browser Compatibility for HTML5
Method 2: Get HTML Source in Selenium with outerHTML
The outerHTML attribute retrieves the entire HTML of the selected element, including the element’s tag itself. This method is useful when the full HTML of an element, including its tag, is required.
Syntax:
element_outer_html = driver.find_element_by_xpath("your_xpath_expression").get_attribute("outerHTML")
Scenario:
In this example, the outerHTML attribute is used to retrieve the complete HTML, including the div tag, of the div element with the class name site-header from bstackdemo.com. This is helpful when the full HTML structure of the element needs to be captured.
Example Code:
from selenium import webdriver # Initialize WebDriver driver = webdriver.Chrome() # Open bstackdemo.com driver.get("https://bstackdemo.com/") # Get the outer HTML of the header section header_outer_html = driver.find_element_by_xpath("//div[@class='site-header']").get_attribute("outerHTML") # Output the outer HTML content (truncated for brevity) print(header_outer_html[:500]) # Print first 500 characters of the element's full HTML # Close the browser driver.quit()
Output:
<div class="site-header"> <nav class="navbar navbar-expand-lg navbar-dark bg-dark"> <a class="navbar-brand" href="/">BrowserStack Demo App</a> <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNav" aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="collapse navbar-collapse" id="navbarNav"> <ul class="navbar-nav ml-auto"> <li class="nav-item"> <a class="nav-link" href="/https/www.browserstack.com/home">Home</a>
Read the outerHTML to get the source with the current element. outerHTML is an element property whose value is the HTML between the opening and closing tags and the HTML of the selected element itself.
For example, the code’s outerHTML property carries a value that contains div and span inside that.
<div> <span>Hello there!</span> </div>
Implement the outerHTML attribute to get the HTML source in Selenium with the following syntax:
ele.get_atrribute("outerHTML")
Automated selenium testing becomes more efficient and result-driven by implementing the code detailed above. Detect, with ease, the HTML source of designated web elements so that they may be examined for anomalies. Needless to say, identifying anomalies quickly leads to equally quick debugging, thus pushing out websites that provide optimal user experiences in minimal timelines.
Run Selenium Tests on BrowserStack
Importance of Testing on Real Device Cloud with BrowserStack
Extracting a web element’s HTML source is essential for debugging, content verification, and capturing dynamic data in automated tests. Selenium WebDriver makes it easy to retrieve and validate element details.
BrowserStack’s real device cloud runs your Selenium tests on actual browsers and devices, not emulators, providing accurate user experience insights. It offers seamless cross-browser, cross-device testing with key advantages.
Why Test on Real Devices with BrowserStack Automate
- Testing in Real User Conditions: Detect UI glitches and behavior issues that only appear on actual devices.
- Maximum Coverage: Access thousands of real devices and browser combinations to ensure broad compatibility.
- Faster Test Cycles: Instantly run and scale tests without managing physical hardware.
- Remote Collaboration: Test anytime, anywhere, enabling efficient teamwork across distributed teams.
Conclusion
Getting the HTML source of a web element in Selenium WebDriver is essential for validating page content and checking element structures.
Testing on a real device cloud, such as BrowserStack, further ensures accurate results across different devices and browsers, leading to more reliable web applications.
Useful Resources for Selenium and Python
- Selenium Python Tutorial (with Example)
- Headless Browser Testing With Selenium Python
- How to Press Enter without Element in Selenium Python?
- How to install GeckoDriver for Selenium Python?
- How to perform Web Scraping using Selenium and Python
- How to Create and Use Action Class in Selenium Python
- Using Selenium Wire Proxy in Python
- Get Current URL in Selenium using Python: Tutorial
- How to read Config Files in Python using Selenium
- Page Object Model and Page Factory in Selenium Python
- How to perform Scrolling Down in Selenium with Python?
- How to install Selenium Python on macOS?
- How to Maximize Browser Window in Selenium with Python
- How to use Python WebDriver Manager for Selenium Testing?
- UI Automation using Python and Selenium: Tutorial
- How to handle dropdown in Selenium Python?
- Start Selenium Testing with Python: Automated Testing of a User Signup Form
- How to Switch Tabs in Selenium For Python
- How to Double Click on an Element in Selenium Python?
- How to take Screenshots using Python and Selenium
- How to download a file using Selenium and Python