问题
I'm working on a lot of patient intake questionnaires. Here is a scanned example of the Questionnaire. I need to process them and stored into the database, but I had a problem in detecting these handwritten marks:
Patient Intake Questionnaire
There are different types of marks in questionnaires. Some checkboxes are painted black. Some checkboxes have tick or cross marks. These marks do all mean the checkboxes are selected. I need to use opencv2 to recognize which boxes are checked.
I've tried Optical Character Recognizing but the result can't really help. The marks have too many shapes so OCR recognize them as different characters. I need to figure out which boxes are checked in a questionnaire. cv2 could have addressed this problem but I have no clue.
# Expected input: An image of Questionnaire
# Expected output:
Have you seen other health care providers for your problems of dizziness
and/or imbalance? [selected] Yes [unselected] No
Have you been through a program of Vestibular and Balance Rehabilitation
Therapy? [selected] Yes [unselected] No
=============================
[unselected] vertigo
[unselected] falling
...
[selected] Drunk-like
=============================
[selected] Vertigo
[selected] Falling
[selected] Fatigue
[selected] Wooziness
[selected] Spinning
[unselected] Disconnected
My previous attempt using Python tesseract OCR package:
from PIL import Image
import pytesseract
path ="page1.jpg"
img = Image.open(path)
text = pytesseract.image_to_string(img, lang='eng', config='-c preserve_interword_spaces=1 --psm 6')
print text
O Vertigo O Falling O Fatigue W Vertigo YA Falling y[ Fatigue
[ Wooziness O Spinning O Disconnected A \Wooziness Q Spinning [ Disconnected
O Imbalance B Drunk-like O Swirling O Imbalance O Drunk-like @ Swirling ;
O Faint [ Rocking O Can’tfocus M Faint 4 Rocking O Can’t focus
O Lightheaded O Swaying -~ . -0 Unsteady O Lightheaded O Swaying N Unsteady
O “onaboat” O Swimming sensation Weonaboat” @ Swimming sensation
O Other: 0 Other:
My thought was: If OCR recognized the rectangular checkbox as character 'O' or number '0', the checkbox should be unselected. Otherwise it should be selected. Based on that rule I can detect the handwritten mark based on the OCR results. I'll test on a few samples and see the precision, though I'm not sure if that's doable. If so, I' ll report back to this post a few moments later.
来源:https://stackoverflow.com/questions/56011081/opencv-detecting-handwritten-mark-of-checkboxes-from-questionnaire