How do I use the Tesseract API to iterate over words?

前端 未结 1 1267
暗喜
暗喜 2021-01-01 06:13

I\'m trying to learn Python in parallel with the Tesseract API. My end goal is to learn how to use the Tesseract API to be able to read a document and do some basic error ch

相关标签:
1条回答
  • 2021-01-01 06:40
    api.Recognize()
    api.SetVariable("save_blob_choices","T")
    ri=api.GetIterator()
    level=tesserocr.RIL.WORD
    boxes = api.GetComponentImages(tesserocr.RIL.TEXTLINE, True)
    text_list = []
    print 'Found {} textline image components.'.format(len(boxes))
    i = 0
    for r in tesserocr.iterate_level(ri, level):
        symbol = r.GetUTF8Text(level)
        conf = r.Confidence(level)
        bbox = r.BoundingBoxInternal(level)
        im = Image.fromarray(img[bbox[1]:bbox[3], bbox[0]:bbox[2]])
        im.save("../out/" + str(i) + ".tif")
        text_list.append(symbol + " " + str(conf) + "\n")
        i += 1
    

    I think the function r.BoundingBoxInternal(level) will give the bounding box of the detected word.

    0 讨论(0)
提交回复
热议问题