Extracting text marked for redaction in a PDF document using .NET [closed]

北慕城南 提交于 2020-01-05 07:09:29

问题


I am working on a PDF acrobat add-on product and one of the requirements is to extract the text marked for redaction in a given PDF document.

Assuming you know what is "redaction" ( Please read this if you don't http://acrobatusers.com/tutorials/redacting-pdf-files-survey-tools ), please suggest how can I discover the co-ordinates for the text which has been "marked" for redaction in any PDF and then extract the exact text.

Please ask for more details if you believe you can lead me to the correct answers. I have tried using iTextSharp and Aspose.PDF libraries for the same without much success.


回答1:


When you mark text for redaction with Acrobat, it creates redaction annotations. The redaction annotations have the /Subtype key set to /Redact. The redaction area is defined by the /QuadPoints key in annotation dictionary. I do not know if iTextSharp or Aspose support redaction annotations. With iTextSharp you can use the COS API to retrieve the raw PDF objects and inspect the objects you need.



来源:https://stackoverflow.com/questions/12107565/extracting-text-marked-for-redaction-in-a-pdf-document-using-net

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!