OpenCV - Detecting handwritten mark of checkboxes from questionnaire

冷暖自知 提交于 2020-01-01 16:43:33

问题


I'm working on a lot of patient intake questionnaires. Here is a scanned example of the Questionnaire. I need to process them and stored into the database, but I had a problem in detecting these handwritten marks:

Patient Intake Questionnaire

There are different types of marks in questionnaires. Some checkboxes are painted black. Some checkboxes have tick or cross marks. These marks do all mean the checkboxes are selected. I need to use opencv2 to recognize which boxes are checked.

I've tried Optical Character Recognizing but the result can't really help. The marks have too many shapes so OCR recognize them as different characters. I need to figure out which boxes are checked in a questionnaire. cv2 could have addressed this problem but I have no clue.

# Expected input: An image of Questionnaire

# Expected output:
Have you seen other health care providers for your problems of dizziness 
and/or imbalance? [selected] Yes [unselected] No

Have you been through a program of Vestibular and Balance Rehabilitation 
Therapy? [selected] Yes [unselected] No

=============================
[unselected] vertigo
[unselected] falling
...
[selected] Drunk-like

=============================
[selected] Vertigo
[selected] Falling
[selected] Fatigue
[selected] Wooziness
[selected] Spinning
[unselected] Disconnected

My previous attempt using Python tesseract OCR package:

from PIL import Image
import pytesseract
path ="page1.jpg"
img = Image.open(path)
text = pytesseract.image_to_string(img, lang='eng', config='-c preserve_interword_spaces=1 --psm 6')
print text

O Vertigo           O Falling              O Fatigue                 W Vertigo          YA Falling             y[ Fatigue
[ Wooziness     O Spinning         O Disconnected       A \Wooziness     Q Spinning         [ Disconnected
O Imbalance      B Drunk-like        O Swirling             O Imbalance      O Drunk-like       @ Swirling      ;
O Faint            [ Rocking        O Can’tfocus         M Faint           4 Rocking          O Can’t focus
O Lightheaded O Swaying -~ . -0 Unsteady       O Lightheaded O Swaying       N Unsteady
O “onaboat” O Swimming sensation                      Weonaboat” @ Swimming sensation
O Other:                                                        0 Other:

My thought was: If OCR recognized the rectangular checkbox as character 'O' or number '0', the checkbox should be unselected. Otherwise it should be selected. Based on that rule I can detect the handwritten mark based on the OCR results. I'll test on a few samples and see the precision, though I'm not sure if that's doable. If so, I' ll report back to this post a few moments later.

来源:https://stackoverflow.com/questions/56011081/opencv-detecting-handwritten-mark-of-checkboxes-from-questionnaire

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!