问题
I am using Google Cloud Vision API
on Python
to detect text values in hoarding boards that are usually found above a shop/store. So far I have been able to detect individual words and their bounding polygons' coordinates. Is there a way to group the detected words based on their relative positions and sizes?
For example, the name of the store is usually written in same size and the words are aligned. Does the API provide some functions that group those words which probably are parts of a bigger sentence (the store name, or the address, etc.)?
If the API does not provide such functions, what would be a good approach to group them? Following is an example of an image what I have done so far:
Vision API output excerpt:
description: "SHOP"
bounding_poly {
vertices {
x: 4713
y: 737
}
vertices {
x: 5538
y: 737
}
vertices {
x: 5538
y: 1086
}
vertices {
x: 4713
y: 1086
}
}
, description: "OVOns"
bounding_poly {
vertices {
x: 6662
y: 1385
}
vertices {
x: 6745
y: 1385
}
vertices {
x: 6745
y: 1402
}
vertices {
x: 6662
y: 1402
}
}
回答1:
I suggest you to take a look on the TextAnnotation response format that is applied when using the DOCUMENT_TEXT_DETECTION
for OCR recognition request. This responses contains detailed information about the image metadata and text content values that can be used to group the text by block, paragraph, word, etc, as described in the public documentation:
TextAnnotation contains a structured representation of OCR extracted text. The hierarchy of an OCR extracted text structure is like this: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol
Additionally, you can follow this useful example where is shown how you can organize the text extracted from a receipt image by processing the fullTextAnnotation
response content.
来源:https://stackoverflow.com/questions/52383178/how-to-group-blocks-that-are-part-of-a-bigger-sentences-in-google-cloud-vision-a