How do I use the Tesseract API to iterate over words?

前端未结

关注

 1  1267

I\'m trying to learn Python in parallel with the Tesseract API. My end goal is to learn how to use the Tesseract API to be able to read a document and do some basic error ch

相关标签:

1条回答

天命终不由人

2021-01-01 06:40

api.Recognize()
api.SetVariable("save_blob_choices","T")
ri=api.GetIterator()
level=tesserocr.RIL.WORD
boxes = api.GetComponentImages(tesserocr.RIL.TEXTLINE, True)
text_list = []
print 'Found {} textline image components.'.format(len(boxes))
i = 0
for r in tesserocr.iterate_level(ri, level):
    symbol = r.GetUTF8Text(level)
    conf = r.Confidence(level)
    bbox = r.BoundingBoxInternal(level)
    im = Image.fromarray(img[bbox[1]:bbox[3], bbox[0]:bbox[2]])
    im.save("../out/" + str(i) + ".tif")
    text_list.append(symbol + " " + str(conf) + "\n")
    i += 1

I think the function r.BoundingBoxInternal(level) will give the bounding box of the detected word.

0 讨论(0)