I have a PDF with A4 pages. Each page contains two identical A5 pages for printing reasons. What I want to do in my Java program is to split these pages and use each unique
A possibly less clunky solution for the record, using pdfjam
-related bits. If test.pdf is an A4 landscape doc to be split into A5 portrait:
1) extract left half-pages
pdfcrop --bbox "0 0 421 595" --clip --papersize "a5" test.pdf test-left.pdf
Note: --bbox "<left> <bottom> <right> <top>"
works in bp units
2) extract right half-pages:
pdfcrop --bbox "421 0 842 595" --clip --papersize "a5" test.pdf test-right.pdf
3) collate pages as desired, e.g.
pdfjoin test-left.pdf test-right.pdf "1" --outfile test-collated.pdf
4) reglue:
pdfnup --nup 2x1 test-collated.pdf --a4paper --outfile test-done.pdf
I once did something like that with camlpdf. In my case, I had a PDF where a physical A4 page consisted of two logical A5 pages and I wanted to get a normal PDF with A5 pages (i.e. where logical and physical page were the same).
This was in OCaml (camlpdf also exists for F#) and my code was the following:
let pdf = Pdfread.pdf_of_file None in_file ;;
let pdf =
let (pdf,_perms) = Pdfcrypt.decrypt_pdf "" pdf in
match pdf with
| Some pdf -> pdf
| None -> failwith "Could not decrypt"
;;
let pdf = Pdfmarks.remove_bookmarks pdf ;;
let pages = Pdfdoc.pages_of_pagetree pdf ;;
let pages = List.fold_right (fun page acc ->
let (y1,x1,y2,x2) = Pdf.parse_rectangle page.Pdfdoc.mediabox in
let box y1 x1 y2 x2 = Pdf.Array
[ Pdf.Real y1; Pdf.Real x1; Pdf.Real y2; Pdf.Real x2 ]
in
let xm = x1 *. 0.5 +. x2 *. 0.5 in
let pagel = {page with Pdfdoc.mediabox = box y1 x1 y2 xm}
and pager = {page with Pdfdoc.mediabox = box y1 xm y2 x2}
in pagel::pager::acc
) pages [] ;;
let pdf = Pdfdoc.change_pages false pdf pages ;;
Pdf.remove_unreferenced pdf ;;
Pdfwrite.pdf_to_file pdf out_file ;;
If iText offers similar abstractions, perhaps you can do something like this. The procedure is the following:
Try iText library http://itextpdf.com/. You can use existing pdf file for pattern, edit rotate and split existing documents. Usefull samples you can find here: http://www.1t3xt.info/examples/browse/