PDFBox parse PDF form example

pdfbox parse pdf form example

PDFBox is an open-source Java library for working with PDF documents licenced by Apache. It it is simple to understand and to integrate.

In this article I will show how to parse forms in PDF files.

Integration with PDFBox

Since PDFBox is a regular Java library, you can add it to your project in any way you know. One of solutions – use the Maven dependency (for code listed below I used version 1.8.10).

Parsing a PDF form

Now you can use the API from PDFBox library that allows us to parse and operate with the structure of PDF-documents.

Pre-requirements: you need a pdf file with form in your file system.

Let’s implement some parsing logic. We will open a document, fetch names for all form fields and set their values to “1” (just for library evaluation approach). Here is the code listing for this example:

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;

import java.io.IOException;

public class Parser {
    public static void main(String[] args) throws IOException, COSVisitorException {
        PDDocument pdfDoc = PDDocument.load("C:\\test.pdf"); // paste your pdf-file location here

        PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
        PDAcroForm acroForm = docCatalog.getAcroForm();

        for (Object fieldObj : acroForm.getFields()){
            PDField field = (PDField) fieldObj;
            System.out.println(field.getAlternateFieldName()); // print field's name
            field.setValue("1"); // set value of field to 1

        pdfDoc.save("C:\\test2.pdf"); // save changes to another file

When you execute this example code you will see fields names printed out in console and newly created document with all fields values set to “1”.


Leave a Reply

Be the First to Comment!