PDFBox: PDDocument to byte array

In this chapter we will perform a simple action with PDFBox API – converting PDDocument object to byte array.

PDDocument is a class that represents the PDF file. There are some cases when you need to have the document body in binary form.

PDDocument does not have any methods to convert the PDF data into array of bytes, but there’s another class in this library that will help us – PDStream.

Here is an example of converting PDDocument to bytearray in just 2 lines.

Convert PDDocument to byte array

PDStream pdStream = new PDStream(pdfDoc);
byte[] data = pdStream.getByteArray();

Yes, it is so simple!

If you are interested in another features of PDFBox, let me know in the comments.



1 year 23 days ago

1. in pdfbox 2.0.8 it’s pdStream.toByteArray(), not pdStream.getByteArray();
2. the stream returned by new PDStream(pdfDoc) is empty. `pdStream.getLength()` returns 0.