Merging Tagged PDF without ruining the tags

后端未结

关注

 2  1322

I am trying to merge two Tagged PDF\'s with the iTextPDF 5.4.4 version jar. After doing all the operations while closing the document on the line: document.close();): . It thro

相关标签:

2条回答

广开言路

2021-01-25 02:45
This looks like a bug in the current iText versions.

@Bruno maybe someone should look into this

PdfCopy has a method fixTaggedStructure which tries to fix the tagged structure which has been somewhat garbled by copying tagged pages. Up to the current iText 5.4.6-SNAPSHOT inclusively you find the following code
```
PdfDictionary dict = (PdfDictionary)iobj.object;
PdfIndirectReference pg = (PdfIndirectReference)dict.get(PdfName.PG);
//if pg is real page - do nothing, else set correct pg and remove first MCID if exists
if (!pageReferences.contains(pg) && !pg.equals(currPage)){
    dict.put(PdfName.PG, currPage);
    PdfArray kids = dict.getAsArray(PdfName.K);
    if (kids != null) {
        PdfObject firstKid = kids.getDirectObject(0);
        if (firstKid.isNumber()) kids.remove(0);
    }
}
```
for a StructElem tagged element dict from some array. This code implicitly assumes that there is an entry for the key PdfName.PG in that dictionary dict by doing pg.equals(currPage). Unfortunately that entry is optional, e.g. the sample document provided by the OP contains such StructElem dictionaries referenced from some array without a Pg entry. This causes the NPE in question.

In this case it suffices to change the order in the equals call, i.e. instead of
```
if (!pageReferences.contains(pg) && !pg.equals(currPage)){
```
one should use
```
if (!pageReferences.contains(pg) && !currPage.equals(pg)){
```
or
```
if (pg != null && !pageReferences.contains(pg) && !pg.equals(currPage)){
```
depending on the actual program logic here.

@Bruno Please check which variant is semantically correct; I'm not really into this tagged structure stuff after all...
0 讨论(0)
发布评论:

提交评论
- 加载中...

伪装坚强ぢ

2021-01-25 02:47

The Code was written in C#

  public static byte[] mergeTest(byte[] pdf) {
        PdfReader reader = null;
        Document doc = null;
        PdfCopy copy = null;
        MemoryStream stream = new MemoryStream();
        byte[] output = null;

        try {
            reader = new PdfReader(pdf);
            doc = new Document();

            copy = new PdfCopy(doc, stream);
            bool tagged = reader.IsTagged();

            if (tagged)
                copy.SetTagged();


            doc.Open();

            for (int x = 1; x <= reader.NumberOfPages; x++) {
                copy.AddPage(copy.GetImportedPage(reader, x, tagged));
            }

            copy.FreeReader(reader);
            doc.Close();
            copy.Close();

            output = stream.ToArray();

            stream.Flush();
            stream.Dispose();

        } catch (Exception ex) {

        } finally {
            try {
                if (reader != null)
                    reader.Close();
            } catch (Exception) { }
        }
        return output;
    }

0 讨论(0)