extracting data from specific image locations using google vision OCR API

吃可爱长大的小学妹 提交于 2019-12-23 05:26:05

问题


I am using Googles Vision OCR API to try and extract 2 types of data from an image 1) handwritten text from text-boxes; marked with red circles below and 2) ticks or 'x' from check-boxes; marked with green circles below. I will be entering this data into a database so I will need a string returned for both types of data

Currently, when I pass this image into the API I get a string with all of the data:

Secondary School Study Student Perception of Computers LO 13 . Are any of your family members working >in computing / IT ? If so , what family member ( s ) is it ( eg , parent , guardian , brother , sister >etc . ) brother 14 . Have you any previous computing experience ( even attended a single day ) ? Select >one or many areas : U CODER DOJO IN SCHOOL CAMP VSELF TAUGHT JOTHER If you selected any from Q14 , was >the general experience : GOOD NEITHER GOOD OR BAD BAD BAD And why ( short answer , under 4 words ) >learned new skills To be completed after the camp . NewsLRY 1 . I would now consider a career in >computing / IT . Strongly Agree Agree No Opinion Disagree Strongly Disagree 2 . The camp showed me what >a career in computing / IT really was . ? Strongly Agree Agree No Opinion Disagree Strongly Disagree 3 >. The camp showed / highlighted that I was no good at programming or computing . Strongly Agree Agree >No Opinion Disagree Strongly Disagree 4 . Give two things that you did not know about computing / >programming until after the camp ? java Language Eclipse IDE va 5 . I was better than I first thought ( >before the camp ) at programming / computing . ? Agree No Opinion Disagree Strongly Disagree ? O >Strongly Agree 6 . Any feedback / comments about the camp ( good or bad ) ? good camp , Learned a lot . >Thank you for taking this survey . Page 2 of 2

My code as it stands:

 public static void Main(string[] args)
        {

            string credential_path = @"C:\Users\35385\nodal.json";
            System.Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", credential_path);

            // Instantiates a client
            var client = ImageAnnotatorClient.Create();
            // Load the image file into memory
            var image = Image.FromFile("stack.jpg");
            // Performs text detection on the image file
            var response = client.DetectDocumentText(image);

            string words = "";

            foreach (var page in response.Pages)
            {
                foreach (var block in page.Blocks)
                {
                    string box = string.Join(" - ", block.BoundingBox.Vertices.Select(v => $"({v.X}, {v.Y})"));
                    foreach (var paragraph in block.Paragraphs)
                    {
                        box = string.Join(" - ", paragraph.BoundingBox.Vertices.Select(v => $"({v.X}, {v.Y})"));
                        foreach (var word in paragraph.Words)
                        {
                            words += $" {string.Join("", word.Symbols.Select(s => s.Text))}";
                        }
                    }
                }
            }

            Console.WriteLine(words);


        }

So my questions:

  1. How can I extract data from each red box (i.e. the first text-box will return 'brother', the 2nd should return 'learned new skills')?
  2. How can I extract which check-box is marked from each green question (i.e. question 13 should return 'YES', question 14. should return 'SELF TAUGHT' etc.)?

回答1:


I just used the API from some PHP-scripts but I think your problem does not depend on the programming language. You need to use the coordinates (boxes with four vertices to be precise) of the detected words. Then you can find the elements of your questionaire relative to the writing of the participant. A good entry point for me was this script:

https://www.leanx.eu/tutorials/use-google-cloud-vision-api-to-process-invoices-and-receipts

You can use it "as is" on any PHP-enabled webspace and it gives you a well structured overview on how you can retrieve the boxes that the API returns.

Having those boxes and knowing the text of your questionaire it should be quite easy to locate the checkmarks that your participants made if google detects them. The detection of the checkmark might not always work with google vision, since a single "character" is not always found by google's OCR.



来源:https://stackoverflow.com/questions/58736076/extracting-data-from-specific-image-locations-using-google-vision-ocr-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!