问题
I am using Googles Vision OCR API to try and extract 2 types of data from an image 1) handwritten text from text-boxes; marked with red circles below and 2) ticks or 'x' from check-boxes; marked with green circles below. I will be entering this data into a database so I will need a string returned for both types of data
Currently, when I pass this image into the API I get a string with all of the data:
Secondary School Study Student Perception of Computers LO 13 . Are any of your family members working >in computing / IT ? If so , what family member ( s ) is it ( eg , parent , guardian , brother , sister >etc . ) brother 14 . Have you any previous computing experience ( even attended a single day ) ? Select >one or many areas : U CODER DOJO IN SCHOOL CAMP VSELF TAUGHT JOTHER If you selected any from Q14 , was >the general experience : GOOD NEITHER GOOD OR BAD BAD BAD And why ( short answer , under 4 words ) >learned new skills To be completed after the camp . NewsLRY 1 . I would now consider a career in >computing / IT . Strongly Agree Agree No Opinion Disagree Strongly Disagree 2 . The camp showed me what >a career in computing / IT really was . ? Strongly Agree Agree No Opinion Disagree Strongly Disagree 3 >. The camp showed / highlighted that I was no good at programming or computing . Strongly Agree Agree >No Opinion Disagree Strongly Disagree 4 . Give two things that you did not know about computing / >programming until after the camp ? java Language Eclipse IDE va 5 . I was better than I first thought ( >before the camp ) at programming / computing . ? Agree No Opinion Disagree Strongly Disagree ? O >Strongly Agree 6 . Any feedback / comments about the camp ( good or bad ) ? good camp , Learned a lot . >Thank you for taking this survey . Page 2 of 2
My code as it stands:
public static void Main(string[] args)
{
string credential_path = @"C:\Users\35385\nodal.json";
System.Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", credential_path);
// Instantiates a client
var client = ImageAnnotatorClient.Create();
// Load the image file into memory
var image = Image.FromFile("stack.jpg");
// Performs text detection on the image file
var response = client.DetectDocumentText(image);
string words = "";
foreach (var page in response.Pages)
{
foreach (var block in page.Blocks)
{
string box = string.Join(" - ", block.BoundingBox.Vertices.Select(v => $"({v.X}, {v.Y})"));
foreach (var paragraph in block.Paragraphs)
{
box = string.Join(" - ", paragraph.BoundingBox.Vertices.Select(v => $"({v.X}, {v.Y})"));
foreach (var word in paragraph.Words)
{
words += $" {string.Join("", word.Symbols.Select(s => s.Text))}";
}
}
}
}
Console.WriteLine(words);
}
So my questions:
- How can I extract data from each red box (i.e. the first text-box will return 'brother', the 2nd should return 'learned new skills')?
- How can I extract which check-box is marked from each green question (i.e. question 13 should return 'YES', question 14. should return 'SELF TAUGHT' etc.)?
回答1:
I just used the API from some PHP-scripts but I think your problem does not depend on the programming language. You need to use the coordinates (boxes with four vertices to be precise) of the detected words. Then you can find the elements of your questionaire relative to the writing of the participant. A good entry point for me was this script:
https://www.leanx.eu/tutorials/use-google-cloud-vision-api-to-process-invoices-and-receipts
You can use it "as is" on any PHP-enabled webspace and it gives you a well structured overview on how you can retrieve the boxes that the API returns.
Having those boxes and knowing the text of your questionaire it should be quite easy to locate the checkmarks that your participants made if google detects them. The detection of the checkmark might not always work with google vision, since a single "character" is not always found by google's OCR.
来源:https://stackoverflow.com/questions/58736076/extracting-data-from-specific-image-locations-using-google-vision-ocr-api