amazon-textract

How to use the Amazon Textract with PDF files

可紊 提交于 2020-08-10 08:42:52
问题 I already can use the textract but with JPEG files. I would like to use it with PDF files. I have the code bellow: import boto3 # Document documentName = "Path to document in JPEG" # Read document content with open(documentName, 'rb') as document: imageBytes = bytearray(document.read()) # Amazon Textract client textract = boto3.client('textract') documentText = "" # Call Amazon Textract response = textract.detect_document_text(Document={'Bytes': imageBytes}) #print(response) # Print detected

Parse / Extract table from a messed .csv file?

ⅰ亾dé卋堺 提交于 2020-07-06 04:38:03
问题 I am parsing an image (png) with Amazon Textract and extracting the tables. Here is an example of such csv when I open it with open(file_name, "r") and reading it's lines: ['Table: Table_1\n', '\n', 'Test Name ,Result ,Flag ,Reference Range ,Lab ,\n', 'HEPATIC FUNCTION PANEL PROTEIN, TOTAL ,6.1 ,,6.1-8.1 g/dL ,EN ,\n', 'ALBUMIN ,4.3 ,,3.6-5.1 g/dL ,EN ,\n', 'GLOBULIN ,1.8 ,LOW ,1.9-3.7 g/dL (calc) ,EN ,\n', 'ALBUMIN/GLOBULIN RATIO ,2.4 ,,1.0-2.5 (calc) ,EN ,\n', 'BILIRUBIN, TOTAL ,0.6 ,,0.2-1

Parse / Extract table from a messed .csv file?

末鹿安然 提交于 2020-07-06 04:36:33
问题 I am parsing an image (png) with Amazon Textract and extracting the tables. Here is an example of such csv when I open it with open(file_name, "r") and reading it's lines: ['Table: Table_1\n', '\n', 'Test Name ,Result ,Flag ,Reference Range ,Lab ,\n', 'HEPATIC FUNCTION PANEL PROTEIN, TOTAL ,6.1 ,,6.1-8.1 g/dL ,EN ,\n', 'ALBUMIN ,4.3 ,,3.6-5.1 g/dL ,EN ,\n', 'GLOBULIN ,1.8 ,LOW ,1.9-3.7 g/dL (calc) ,EN ,\n', 'ALBUMIN/GLOBULIN RATIO ,2.4 ,,1.0-2.5 (calc) ,EN ,\n', 'BILIRUBIN, TOTAL ,0.6 ,,0.2-1

other options for AWS Textract .Net SDK

空扰寡人 提交于 2020-01-25 06:43:13
问题 I am working on a C# MVC solution which needs to support the uploading of 1000s of scanned .PDF survey forms onto a system and then extract the data from each survey; in order to extract hand-written checkboxes I need to use the AWS Textract API. More information on my project can be found here: AWS textract with hand-written checkboxes My problem is when I downloaded the AWS SDK for .NET I noticed that .Textract is not fully available at the minute for .NET My question being, is there any

AWS textract with hand-written checkboxes

北战南征 提交于 2020-01-23 11:53:09
问题 I have 1000s of survey forms which I need to scan and then upload onto my C# system in order to extract the data and enter it into a database. The surveys are a mix of hand-written 1) text boxes and 2) checkboxes. I am currently using the the Azure Read Api to extract hand-written text which should work fine e.g. question #4 below returns 'Python' and 'coding'. So my question; will any AWS Textract give me the capability to extract data for which checkbox is marked? e.g. see question #1 below