Merging Tagged PDF without ruining the tags

后端 未结 2 1322
慢半拍i
慢半拍i 2021-01-25 02:20

I am trying to merge two Tagged PDF\'s with the iTextPDF 5.4.4 version jar. After doing all the operations while closing the document on the line: document.close();): . It thro

相关标签:
2条回答
  • 2021-01-25 02:45

    This looks like a bug in the current iText versions.

    @Bruno maybe someone should look into this

    PdfCopy has a method fixTaggedStructure which tries to fix the tagged structure which has been somewhat garbled by copying tagged pages. Up to the current iText 5.4.6-SNAPSHOT inclusively you find the following code

    PdfDictionary dict = (PdfDictionary)iobj.object;
    PdfIndirectReference pg = (PdfIndirectReference)dict.get(PdfName.PG);
    //if pg is real page - do nothing, else set correct pg and remove first MCID if exists
    if (!pageReferences.contains(pg) && !pg.equals(currPage)){
        dict.put(PdfName.PG, currPage);
        PdfArray kids = dict.getAsArray(PdfName.K);
        if (kids != null) {
            PdfObject firstKid = kids.getDirectObject(0);
            if (firstKid.isNumber()) kids.remove(0);
        }
    }
    

    for a StructElem tagged element dict from some array. This code implicitly assumes that there is an entry for the key PdfName.PG in that dictionary dict by doing pg.equals(currPage). Unfortunately that entry is optional, e.g. the sample document provided by the OP contains such StructElem dictionaries referenced from some array without a Pg entry. This causes the NPE in question.

    In this case it suffices to change the order in the equals call, i.e. instead of

    if (!pageReferences.contains(pg) && !pg.equals(currPage)){
    

    one should use

    if (!pageReferences.contains(pg) && !currPage.equals(pg)){
    

    or

    if (pg != null && !pageReferences.contains(pg) && !pg.equals(currPage)){
    

    depending on the actual program logic here.

    @Bruno Please check which variant is semantically correct; I'm not really into this tagged structure stuff after all...

    0 讨论(0)
  • 2021-01-25 02:47

    The Code was written in C#

      public static byte[] mergeTest(byte[] pdf) {
            PdfReader reader = null;
            Document doc = null;
            PdfCopy copy = null;
            MemoryStream stream = new MemoryStream();
            byte[] output = null;
    
            try {
                reader = new PdfReader(pdf);
                doc = new Document();
    
                copy = new PdfCopy(doc, stream);
                bool tagged = reader.IsTagged();
    
                if (tagged)
                    copy.SetTagged();
    
    
                doc.Open();
    
                for (int x = 1; x <= reader.NumberOfPages; x++) {
                    copy.AddPage(copy.GetImportedPage(reader, x, tagged));
                }
    
                copy.FreeReader(reader);
                doc.Close();
                copy.Close();
    
                output = stream.ToArray();
    
                stream.Flush();
                stream.Dispose();
    
            } catch (Exception ex) {
    
            } finally {
                try {
                    if (reader != null)
                        reader.Close();
                } catch (Exception) { }
            }
            return output;
        }
    
    0 讨论(0)
提交回复
热议问题