结合Tesseract完成图形验证码识别

Tesseract

Tesseract是目前最准确的OCR（Optical Character Recognition）库.具有很高的灵活性，它可以通过训练识别任何字体。

安装

windows:

https://github.com/tesseract-ocr/tesseract

设置环境变量

安装完成后，如果想要在命令行中使用Tesseract，那么应该设置环境变量。Mac和Linux在安装的时候就默认已经设置好了，在Windows下把tesseract.exe所在的路径添加到Path环境变量中

还有一个环境变量需要设置的是，要把训练的数据文件路径也放到环境变量中。

在环境变量中，添加一个TESSDATA_PREFIX=

这个路径value值跟这样设置即可

在这里插入图片描述

在命令行中使用tesseract识别图像

使用命令：tesseract 图像路径文件路径

示例：

tesseract a.png a

那么就会识别出a.png中的图片，并且把文字写入到a.txt中。如果不想要写入文件直接显示在终端，那么不要加文件名就可以了。

在代码中使用tesseract识别图像

（1）安装

pip3 install pytesseract --default-timeout=1000

同时读取图片，需要借助一个第三方库叫做Pillow

(2)

import pytesseract
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

image = Image.open('1.png')

text = pytesseract.image_to_string(image, lang='chi_sim')

print(text)

t.image_to_string(image, lang=‘chi_sim’)

print(text)

来源：CSDN

作者：The_North

链接：https://blog.csdn.net/LoveStarbucks/article/details/104491635

标签

tesseract

验证码识别

环境变量