问题
I am trying to deploy an app on the Google App Engine that also has OCR function. I downloaded the tesseract using homebrew und using pytesseract
to wrap in Python. The OCR function works on my local system, but it does not when I upload the app to the Google App Engine.
I copied tesseract
folder from usr/local/cellar/tesseract and pasted into the working directory of my app. I uploaded the tesseract files and also pytesseract
files to appengine. I have specified the path for tesseract with os.getcwd()
so that pytesseract
can find it. Nevertheless, this does not work. App engine cannot find the file to execute, since they are not in the same directory (os.getcwd()
) .
Code from pytesseract.py
cmda = os.getcwd()
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
def find_all(name, path):
result = []
for root, dirs, files in os.walk(path):
if name in files:
result.append(os.path.join(root, name))
return result
founds = find_all("tesseract",cmda)
tesseract_cmd = founds[0]
The error from Google App Engine is:
tesseract is not installed on your path.
回答1:
The Google App Engine Standard environment is not suitable for your use case. It is true that the pytesseract and the Pillow libraries can be installed via pip
. But these libraries require the tesseract-ocr and libtesseract-dev platform packages to be installed, which don't come in the base runtime for App Engine Standard Python3.7 runtime. This is producing the error you are getting.
The solution is to use Cloud Run, which will run your application in a Docker container and you will be able to customize your runtime. I have modified this Quickstart guide to run on Cloud Run a sample application that converts an image to text using pytesseract
.
My folder structure:
├── sample
├── requirements.txt
└── Dockerfile
└── app.py
└── test.png
Here is the Dockerfile
:
# Use the official Python image.
# https://hub.docker.com/_/python
FROM python:3.7
# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
# Install production dependencies.
RUN pip install Flask gunicorn
RUN pip install -r requirements.txt
#Install tesseract
RUN apt-get update -qqy && apt-get install -qqy \
tesseract-ocr \
libtesseract-dev
# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 app:app
The contents of app.py
:
from flask import Flask
from PIL import Image
import pytesseract
# If `entrypoint` is not defined in app.yaml, App Engine will look for an app
# called `app` in `main.py`.
app = Flask(__name__)
@app.route('/')
def hello():
return pytesseract.image_to_string(Image.open('test.png'))
if __name__ == "__main__":
app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080)))
The requirements.txt
:
Flask==1.1.1
pytesseract==0.3.0
Pillow==6.2.0
Now to containerize and deploy your application just run:
gcloud builds submit --tag gcr.io/<PROJECT_ID>/helloworld
to build and submit the container to Container Registry.gcloud beta run deploy --image gcr.io/<PROJECT_ID>/helloworld --platform managed
to deploy the container to Cloud Run.
来源:https://stackoverflow.com/questions/57869385/can-not-make-tesseract-work-in-google-app-engine-with-python3