Jsoup Java example

jsoup java example

Jsoup is a Java library for easy and quick HTML parsing and processing.

It provides a bunch of functionalities. For example, it helps you to search and replace fragments. You can manipulate with the DOM in any way you want and parse HTML and XML from any resource.

Now let’s create an example program for parsing the web-page. This would be a showcase for Jsoup basic features.

Jsoup parse example

Create a maven project and add the Jsoup dependency:

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.7.2</version>
</dependency>

In this case we will parse the main page of this blog – 10minbasics.com. Then we will fetch all article headers by CSS-selector and print them out.

Document doc = Jsoup.connect("http://10minbasics.com/").get();
Elements articlesLinks = doc.select(".entry-header a");
for (Element article : articlesLinks) {
    System.out.println(article.text());
}

There’s a big scope of methods to interact with DOM. You can find them all in the Jsoup official documentation.

Let’s go deeper and create one more example. Now we need to parse the same page and replace all links from HTML with its text. Then we will print out the result, so we can check the difference in console.

Jsoup replace links with text example

Document doc = Jsoup.connect("http://10minbasics.com/").get();
Elements links = doc.select("a");
for (Element link : links) {
    link.replaceWith(new TextNode(link.text(), ""));
}
System.out.println(doc.html());

We used the “a” selector to get all links from the document. Then we need to iterate through all of them and replace it with a new Node. In our case this would be a TextNode containing the text of the link.

When you run this Jsoup code example you will see the new HTML in console. You can compare it with original page to see that link tags have disappeared.

Leave a Reply

Be the First to Comment!