OpenCV - Detecting handwritten mark of checkboxes from questionnaire

问题

I'm working on a lot of patient intake questionnaires. Here is a scanned example of the Questionnaire. I need to process them and stored into the database, but I had a problem in detecting these handwritten marks:

Patient Intake Questionnaire

There are different types of marks in questionnaires. Some checkboxes are painted black. Some checkboxes have tick or cross marks. These marks do all mean the checkboxes are selected. I need to use opencv2 to recognize which boxes are checked.

I've tried Optical Character Recognizing but the result can't really help. The marks have too many shapes so OCR recognize them as different characters. I need to figure out which boxes are checked in a questionnaire. cv2 could have addressed this problem but I have no clue.

# Expected input: An image of Questionnaire

# Expected output:
Have you seen other health care providers for your problems of dizziness 
and/or imbalance? [selected] Yes [unselected] No

Have you been through a program of Vestibular and Balance Rehabilitation 
Therapy? [selected] Yes [unselected] No

=============================
[unselected] vertigo
[unselected] falling
...
[selected] Drunk-like

=============================
[selected] Vertigo
[selected] Falling
[selected] Fatigue
[selected] Wooziness
[selected] Spinning
[unselected] Disconnected

My previous attempt using Python tesseract OCR package:

from PIL import Image
import pytesseract
path ="page1.jpg"
img = Image.open(path)
text = pytesseract.image_to_string(img, lang='eng', config='-c preserve_interword_spaces=1 --psm 6')
print text

O Vertigo           O Falling              O Fatigue                 W Vertigo          YA Falling             y[ Fatigue
[ Wooziness     O Spinning         O Disconnected       A \Wooziness     Q Spinning         [ Disconnected
O Imbalance      B Drunk-like        O Swirling             O Imbalance      O Drunk-like       @ Swirling      ;
O Faint            [ Rocking        O Can’tfocus         M Faint           4 Rocking          O Can’t focus
O Lightheaded O Swaying -~ . -0 Unsteady       O Lightheaded O Swaying       N Unsteady
O “onaboat” O Swimming sensation                      Weonaboat” @ Swimming sensation
O Other:                                                        0 Other:

My thought was: If OCR recognized the rectangular checkbox as character 'O' or number '0', the checkbox should be unselected. Otherwise it should be selected. Based on that rule I can detect the handwritten mark based on the OCR results. I'll test on a few samples and see the precision, though I'm not sure if that's doable. If so, I' ll report back to this post a few moments later.

来源：https://stackoverflow.com/questions/56011081/opencv-detecting-handwritten-mark-of-checkboxes-from-questionnaire

标签

python

OpenCV

image-processing

computer-vision

OMR