How to check if a checkbox is checked or not on a non-form PDF using C#?

僤鯓⒐⒋嵵緔 提交于 2019-12-12 01:58:36

问题


Using c#, I want to see if a specific check box is checkd on a PDF page. The PDF file is not a form one.

PDF could be something like:

Sample file is here: MDS30ResidentP2.pdf (in this sample file, I want to somehow figure it out that check-box "E" in the question A1000 is checked. Again: the PDF is not in "form" format!).

PS: none of the following posts was solved my problem:

  • PDF Parsing extract CheckBox Fields value
  • iTextSharp: reading radio button, check box states from a non-form PDF

回答1:


OCR is probably the only way. From the PDF perspective, there's a rectangle and some of those rectangles have two lines drawn through them. They're not even images but actual vector drawing commands. You could possibly look for that extra drawing of an "x" but it is unrelated to the text that appears beside it so'd have to write some fuzzy logic to estimate what "x" goes to what "text" and I think you'd end up with a bunch of false positives. If you've got a bunch of these PDFs it might be worth writing something, otherwise OCR or manual entry.

If you want to parse the PDF you can try something like this which is a little ugly but if you're parsing the same PDF over and over again it might work OK. If you want something more generic and reusable I would check out the creator of iText's post here. His post is for optional content groups but it should give you some ideas to start with.



来源:https://stackoverflow.com/questions/25210668/how-to-check-if-a-checkbox-is-checked-or-not-on-a-non-form-pdf-using-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!