Is it possible to create tagged PDF(PDF/UA) with PDFBox? It looks like PDFBox has an API for that (package org.apache.pdfbox.pdmodel.documentinterchange.taggedpdf
), but I can't find any tutorials or code examples.
Using the code below, I generated a PDF file containing an image, and the screen reader NVDA (in my case) recognizes it and reads '... graphic Alternate Description'. However, the accessibility checker PAC 2 shows an error: 'Image object not tagged'.
PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDDocumentCatalog documentCatalog = doc.getDocumentCatalog();
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, doc);
PDPageContentStream contents = new PDPageContentStream(doc, page);
contents.drawImage(pdImage, 100, 600, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
contents.close();
PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
PDStructureElement structureElement = new PDStructureElement(StandardStructureTypes.Figure, treeRoot);
structureElement.setPage(page);
PDMarkedContent markedImg = new PDMarkedContent(COSName.IMAGE, new COSDictionary());
markedImg.addXObject(pdImage);
structureElement.appendKid(markedImg);
structureElement.setAlternateDescription("Alternate Description");
treeRoot.appendKid(structureElement);
documentCatalog.setStructureTreeRoot(treeRoot);
// ....
doc.save(fileName);
Can you provide some explanations or/and code examples about this subject?
I put up a working example which demonstrates creating an accessible PDF using PDFBox 2: https://github.com/martinlovell/accessible-pdfbox-example
There are a few things missing from the code in the question. The marked content needs alt text, and I believe you need mcids for that marked content.
The example project demonstrates in more detail what you need.
It would be something like this:
PDPageContentStream contents = new PDPageContentStream(doc, page);
// the content in the stream needs an id
int mcid = 5;
COSDictionary dictionary = new COSDictionary();
dictionary = new COSDictionary();
dictionary(COSName.MCID, mcid);
// wrap image drawing in marked content
contents.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(dictionary));
contents.drawImage(pdImage, 100, 600, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
contents.endMarkedContent();
contents.close();
PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
documentCatalog.setStructureTreeRoot(treeRoot);
PDStructureElement structureElement = new PDStructureElement(StandardStructureTypes.Figure, treeRoot);
structureElement.setPage(page);
structureElement.setAlternateDescription("Alternate Description");
// Set alt text on marked content for structure.
// This is the dictionary with the mcid used in beginMarkedContent.
dictionary.setString(COSName.ALT, "Alternate Description");
PDMarkedContent markedImg = new PDMarkedContent(COSName.IMAGE, dictionary);
markedImg.addXObject(pdImage);
structureElement.appendKid(markedImg);
来源:https://stackoverflow.com/questions/39872854/tagged-pdf-with-pdfbox