Does anyone knows the meaning of output of image_to_data, image_to_osd methods of pytesseract?

后端未结

关注

 2  1666

I\'m trying to extract the data from image using pytesseract. This module has image_to_data, image_to_osd methods. These two m

相关标签:

2条回答

刺人心

2021-01-22 11:18
Column Level:
1. Item with no block_num, paragraph_num, line_num, word_num
2. Item with block_num and with no paragraph_num, line_num, word_num
3. Item with block_num, paragraph_num and with no line_num, word_num
4. Item with block_num, paragraph_num, line_num, and with no word_num
5. Item with all those numbers
Column block_num: Block number of the detected text or item
Column par_num: Paragraph number of the detected text or item
Column line_num: Line number of the detected text or item
Column word_num: word number of the detected text or item

But above all 4 columns are interconnected.If the item comes from new line then word number will start counting again from 0, it doesn't continue from previous line last word number. Same goes with line_num, par_num, block_num.

Check out the below image for reference.
1st column: block_num
2nd column: par_num
3rd column: line_num
4rth column: word_num
0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2021-01-22 11:25
my_image.jpg

For example, Test the my_image.jpg with image_to_data in the following code, we will get the results like the results.png.

results.png
- level = 1/2/3/4/5，the level of current item.
- page_num: the page index of the current item. In most instances, a image only has one page.
- block_num: the block item of the current item. when tesseract OCR Image, it will split the image into several blocks according the PSM parameters and some rules. The words in a line often in a block.
- par_num: The paragraph index of the current item. It is the page analysis results. line_num: The line index of the current item. It is the page analysis results. word_num: The word index in one block.
- line_num: The line index of the current item. It is the page analysis results.
- word_num: The word index in one block.
- left/top/width/height：the top-left coordinate and the width and height of the current word.
- conf: the confidence of the current word, the range is -1~100.. The -1 means that there is no text here. The 100 is the highest value.
- text: the word ocr results.
The meaning of the results from image_to_osd:
- Page number: the page index of the current item. In most instances, a image only has one page.
- Orientation in degrees: the clockwise rotation angle of the text in the current image relative to its reading angle, the value range is [0, 270, 180, 90].
- Rotate: Record the angle at which the text in the current image is to be converted into readable, relative to the clockwise rotation of the current image, the value range is [0, 270, 180, 90]. Complementary to the [Orientation in degrees] value.
- Orientation confidence:Indicates the confidence of the current [Orientation in degrees] and [Rotate] detection values. The greater the confidence, the more credible the test result, but no explanation of its value range has been found so far.
- Script: The encoding type of the text in the current picture.
- Script confidence: The confidence of the text encoding type in the current image.
from pytesseract import Output import pytesseract import cv2
```
image = cv2.imread("my_image.jpg")

#swap color channel ordering from BGR (OpenCV’s default) to RGB (compatible with Tesseract and pytesseract).
# By default OpenCV stores images in BGR format and since pytesseract assumes RGB format,
# we need to convert from BGR to RGB format/mode:
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
 
pytesseract.pytesseract.tesseract_cmd = r'C:\mypath\tesseract.exe'
custom_config = r'-c tessedit_char_whitelist=0123456789 --psm 6'
results = pytesseract.image_to_data(rgb, output_type=Output.DICT,lang='eng',config=custom_config)
print(results)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...