I am trying to merge two Tagged PDF\'s with the iTextPDF 5.4.4 version jar. After doing all the operations while closing the document on the line: document.close();): . It thro
This looks like a bug in the current iText versions.
@Bruno maybe someone should look into this
PdfCopy
has a method fixTaggedStructure
which tries to fix the tagged structure which has been somewhat garbled by copying tagged pages. Up to the current iText 5.4.6-SNAPSHOT inclusively you find the following code
PdfDictionary dict = (PdfDictionary)iobj.object;
PdfIndirectReference pg = (PdfIndirectReference)dict.get(PdfName.PG);
//if pg is real page - do nothing, else set correct pg and remove first MCID if exists
if (!pageReferences.contains(pg) && !pg.equals(currPage)){
dict.put(PdfName.PG, currPage);
PdfArray kids = dict.getAsArray(PdfName.K);
if (kids != null) {
PdfObject firstKid = kids.getDirectObject(0);
if (firstKid.isNumber()) kids.remove(0);
}
}
for a StructElem tagged element dict
from some array. This code implicitly assumes that there is an entry for the key PdfName.PG
in that dictionary dict
by doing pg.equals(currPage)
. Unfortunately that entry is optional, e.g. the sample document provided by the OP contains such StructElem dictionaries referenced from some array without a Pg entry. This causes the NPE in question.
In this case it suffices to change the order in the equals
call, i.e. instead of
if (!pageReferences.contains(pg) && !pg.equals(currPage)){
one should use
if (!pageReferences.contains(pg) && !currPage.equals(pg)){
or
if (pg != null && !pageReferences.contains(pg) && !pg.equals(currPage)){
depending on the actual program logic here.
@Bruno Please check which variant is semantically correct; I'm not really into this tagged structure stuff after all...
The Code was written in C#
public static byte[] mergeTest(byte[] pdf) {
PdfReader reader = null;
Document doc = null;
PdfCopy copy = null;
MemoryStream stream = new MemoryStream();
byte[] output = null;
try {
reader = new PdfReader(pdf);
doc = new Document();
copy = new PdfCopy(doc, stream);
bool tagged = reader.IsTagged();
if (tagged)
copy.SetTagged();
doc.Open();
for (int x = 1; x <= reader.NumberOfPages; x++) {
copy.AddPage(copy.GetImportedPage(reader, x, tagged));
}
copy.FreeReader(reader);
doc.Close();
copy.Close();
output = stream.ToArray();
stream.Flush();
stream.Dispose();
} catch (Exception ex) {
} finally {
try {
if (reader != null)
reader.Close();
} catch (Exception) { }
}
return output;
}