Extracting Additional Metadata from a PDF using iTextSharp

老子叫甜甜 提交于 2019-12-18 05:21:30

问题


I've seen the extraction of basic metadata (ie. author, title) using iTextSharp and it usually looks something like this:

var pdfReader = new PdfReader(pdfData);
var author = pdfReader.Info["author"]

However, in my case I'm after something a bit more exotic, the additional "advanced" metadata that the document may contain.

Pardon the paint highlights, but here is a screenshot from within Adobe Acrobat showing the data in question:

In this case, it doesn't seem like this data is available through the Info dictionary. Using a different library (PDFKit by TallComponents) this data is exposed, but I'm wondering if there is any way get it using iItext

I'm currently playing with iText 4.1.6 due to licensing restrictions, but I wouldn't be opposed to buying the commercial license for 5.0.6 if that adds required functionality.


回答1:


not sure if it will get exactly what you need, but to get the XMP metadata try something like this:

PdfReader reader = new PdfReader(YOUR_PDF);
byte[] b = reader.Metadata;
if (b != null) {
  string xml = new UTF8Encoding().GetString(b);
}

notice you get back a XML string.

IIRC the code will work with 4.1.6.



来源:https://stackoverflow.com/questions/4928441/extracting-additional-metadata-from-a-pdf-using-itextsharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!