I\'m trying to learn Python in parallel with the Tesseract API. My end goal is to learn how to use the Tesseract API to be able to read a document and do some basic error ch
api.Recognize()
api.SetVariable("save_blob_choices","T")
ri=api.GetIterator()
level=tesserocr.RIL.WORD
boxes = api.GetComponentImages(tesserocr.RIL.TEXTLINE, True)
text_list = []
print 'Found {} textline image components.'.format(len(boxes))
i = 0
for r in tesserocr.iterate_level(ri, level):
symbol = r.GetUTF8Text(level)
conf = r.Confidence(level)
bbox = r.BoundingBoxInternal(level)
im = Image.fromarray(img[bbox[1]:bbox[3], bbox[0]:bbox[2]])
im.save("../out/" + str(i) + ".tif")
text_list.append(symbol + " " + str(conf) + "\n")
i += 1
I think the function r.BoundingBoxInternal(level) will give the bounding box of the detected word.