Sunday, November 27, 2011

Concatenating PDFs

Concatenating PDF files should be pretty straightforward.  On Linux, there are several tools that can do this, including pdftk, pdf2ps and convert, which is a wrapper for Ghostscript.  Unfortunately, I had a batch of files that I wanted to concatenate for ease of use on my tablet, and none of these tools were working.  pdftk failed repeatedly with the useful error message:


Error: Failed to open PDF file: 
   input.pdf

Using pdf2ps did create a merged copy of the input files, but it was HUGE, consisting of bitmap images of the pages, losing the text in the process.  ImageMagick convert never ran to completion, since I terminated it after it had eaten over 3GB of memory, presumably rendering the text into images.

Ultimately, I was able to successfully create a high quality, merged copy of my files by resorting to manually invoking Ghostscript:

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf *.pdf

The resulting file is actually smaller than the combined total of the input files, storing text as text, rather than horrid, pre-rendered bitmaps.  Ghostscript used a sane amount of memory, and it ran to completion in a sensible amount of time.

To Ghostscript, bravo!  To the others, a big "Why"?

No comments: