问题
Using c#, I want to see if a specific check box is checkd on a PDF page. The PDF file is not a form one.
PDF could be something like:
Sample file is here: MDS30ResidentP2.pdf (in this sample file, I want to somehow figure it out that check-box "E" in the question A1000 is checked. Again: the PDF is not in "form" format!).
PS: none of the following posts was solved my problem:
- PDF Parsing extract CheckBox Fields value
- iTextSharp: reading radio button, check box states from a non-form PDF
回答1:
OCR is probably the only way. From the PDF perspective, there's a rectangle and some of those rectangles have two lines drawn through them. They're not even images but actual vector drawing commands. You could possibly look for that extra drawing of an "x" but it is unrelated to the text that appears beside it so'd have to write some fuzzy logic to estimate what "x" goes to what "text" and I think you'd end up with a bunch of false positives. If you've got a bunch of these PDFs it might be worth writing something, otherwise OCR or manual entry.
If you want to parse the PDF you can try something like this which is a little ugly but if you're parsing the same PDF over and over again it might work OK. If you want something more generic and reusable I would check out the creator of iText's post here. His post is for optional content groups but it should give you some ideas to start with.
来源:https://stackoverflow.com/questions/25210668/how-to-check-if-a-checkbox-is-checked-or-not-on-a-non-form-pdf-using-c