PDFBox parse PDF form example

pdfbox parse pdf form example

PDFBox is an open-source Java library for working with PDF documents licenced by Apache. It it is simple to understand and to integrate.

In this article I will show how to parse forms in PDF files.

Integration with PDFBox

Since PDFBox is a regular Java library, you can add it to your project in any way you know. One of solutions – use the Maven dependency (for code listed below I used version 1.8.10).

Parsing a PDF form

Now you can use the API from PDFBox library that allows us to parse and operate with the structure of PDF-documents.

Pre-requirements: you need a pdf file with form in your file system.

Let’s implement some parsing logic. We will open a document, fetch names for all form fields and set their values to “1” (just for library evaluation approach). Here is the code listing for this example:

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;

import java.io.IOException;

public class Parser {
    public static void main(String[] args) throws IOException, COSVisitorException {
        PDDocument pdfDoc = PDDocument.load("C:\\test.pdf"); // paste your pdf-file location here

        PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
        PDAcroForm acroForm = docCatalog.getAcroForm();

        for (Object fieldObj : acroForm.getFields()){
            PDField field = (PDField) fieldObj;
            System.out.println(field.getAlternateFieldName()); // print field's name
            field.setValue("1"); // set value of field to 1
        }

        pdfDoc.save("C:\\test2.pdf"); // save changes to another file
        pdfDoc.close();
    }
}

When you execute this example code you will see fields names printed out in console and newly created document with all fields values set to “1”.

 

Leave a Reply

Be the First to Comment!