jBrowserDriver
jBrowserDriver is a programmable, embeddable web browser driver compatible with the Selenium WebDriver specification. It is headless, WebKit-based, and written in pure Java. The project is open source and licensed under the Apache License v2.0. To run jBrowserDriver from a remote Selenium server, start the remote Selenium server(s) and use the appropriate code to call jBrowserDriver remotely. For building from source, install and configure Maven v3.x and run mvn clean compile install from the project root. To use in Eclipse, either import the existing Java project from the root directory or import the Maven file. For usage, jBrowserDriver can be used like any other Selenium WebDriver or RemoteWebDriver and works with Selenium Server and Selenium Grid.
Learn more
Selenium
Selenium automates browsers. That's it! What you do with that power is entirely up to you. Primarily it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should) also be automated as well. If you want to create robust, browser-based regression automation suites and tests, scale and distribute scripts across many environments, then you want to use Selenium WebDriver, a collection of language specific bindings to drive a browser - the way it is meant to be driven. If you want to create quick bug reproduction scripts, create scripts to aid in automation-aided exploratory testing, then you want to use Selenium IDE; a Chrome and Firefox add-on that will do simple record-and-playback of interactions with the browser. If you want to scale by distributing and running tests on several machines and manage multiple environments from a central point.
Learn more
jsoup
jsoup is a Java library that simplifies working with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and XPath selectors. jsoup implements the WHATWG HTML5 specification and parses HTML to the same DOM as modern browsers. With jsoup, you can scrape and parse HTML from a URL, file, or string; find and extract data using DOM traversal or CSS selectors; manipulate HTML elements, attributes, and text; clean user-submitted content against a safelist to prevent XSS attacks; and output tidy HTML. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup, creating a sensible parse tree. For example, you can fetch the Wikipedia homepage, parse it to a DOM, and select the headlines from the "In the news" section into a list of elements.
Learn more
Jaunt
Jaunt is a Java library designed for web scraping, web automation, and JSON querying. It provides a fast, ultra-light headless browser that enables Java programs to perform tasks such as web scraping, form handling, and interfacing with REST APIs. Jaunt supports parsing of HTML, XHTML, XML, and JSON, and offers features like HTTP header and cookie manipulation, proxy support, and customizable caching. The library does not support JavaScript execution; however, for automating JavaScript-enabled browsers, Jauntium is recommended. Jaunt is available under the Apache License, with a monthly edition that expires periodically, requiring users to download the latest version upon expiration. The library is suitable for tasks such as parsing and extracting data from web pages, filling out and submitting forms, and handling HTTP requests and responses. Comprehensive tutorials and documentation are available to assist users in getting started with Jaunt.
Learn more