1、pip3 install pyocr
2、pip3 install pillow or easy_install Pillow
3、安装tesseract-ocr:http://jaist.dl.sourceforge.net/project/tesseract-ocr-alt/tesseract-ocr-setup-3.02.02.exe,安装在C:\Program Files\下
4、要求python默认安装在C盘
5、找到 pytesseract.py 更改 tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
代码:
# !/usr/bin/python3.4 # -*- coding: utf-8 -*- import pytesseract from PIL import Image image = Image.open('../jpg/code.png') code = pytesseract.image_to_string(image) print(code)
如果出现错误:
'str' does not support the buffer interface
将 `pytesseract.py` 中的下面语句更换:
1 lines = error_string.splitlines() 2 #error_lines = tuple(line for line in lines if line.find('Error') >= 0) 3 error_lines = tuple(line.decode('utf-8') for line in lines if line.find(b'Error') >= 0) 4 if len(error_lines) > 0: 5 return '\n'.join(error_lines) 6 else: 7 return error_string.strip()
如果要识别更多的文字,需要在安装tesseract-ocr的时候选择全部语言,也就1.3G
识别精度不是很高,要不就是现在的验证码太变态,人为也看不出来是什么
推荐机器学习验证码:http://www.cnblogs.com/beer/p/5672678.html
来源:https://www.cnblogs.com/TTyb/p/5996847.html