google vision API returns empty bounding box vertexes, instead it returns normalised_vertexes

问题

I am using vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION to extract some dense text in a pdf document. Here is my code:

from google.cloud import vision

def extract_text(bucket, filename, mimetype):
    print('Looking for text in PDF {}'.format(filename))
    # BATCH_SIZE; How many pages should be grouped into each json output file.
    # """OCR with PDF/TIFF as source files on GCS"""

    # Detect text
    feature = vision.types.Feature(
        type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)
    # Extract text from source bucket
    gcs_source_uri = 'gs://{}/{}'.format(bucket, filename)
    gcs_source = vision.types.GcsSource(uri=gcs_source_uri)
    input_config = vision.types.InputConfig(
        gcs_source=gcs_source, mime_type=mimetype)

    request = vision.types.AnnotateFileRequest(features=[feature], input_config=input_config)

    print('Waiting for the ORC operation to finish.')
    ocr_response = vision_client.batch_annotate_files(requests=[request])

    print('OCR completed.')

In the response, I am expecting to find into ocr_response.responses[1...n].pages[1...n].blocks[1...n].bounding_box a list of vertices filled in, but this list is empty. Instead, there is a normalized_vertices list which are the normalised vertices between 0 and 1. Why is that so? why the vertices structure is empty? I am following this article, and the author there uses vertices, but I don't understand why I don't get them. To convert them to the non normalised form, I am multiplying the normalised vertex by height and width, but the result is awful, the boxes are not well positioned.

回答1:

To convert Normalized Vertex to Vertex you should multiply the x field of your NormalizedVertex with the width value to get the x field of the Vertex and multiply the y field of your NormalizedVertex with the height value to get the y of the Vertex.

The reason why you get Normalized Vertex, and the author of Medium article get Vertex is because the TEXT_DETECTION and DOCUMENT_TEXT_DETECTION models have been upgraded to newer versions since May 15, 2020, and medium article was written on Dec 25, 2018.

To use legacy models for results, you must specify "builtin/legacy_20190601" in the model field of a Feature object to get the old model results.

But the Google's doc mention that after November 15, 2020 the old models will not longer be offered.

来源：https://stackoverflow.com/questions/62347921/google-vision-api-returns-empty-bounding-box-vertexes-instead-it-returns-normal

标签

python-3.x

google-cloud-platform

google-cloud-vision

google-vision