I am trying to extract information from a range of different receipts using a combination of Opencv, Tesseract and Keras. The end result of the project is that I should be able
its a good idea to use image, as you will loose the structure of the document if you just you plain OCR. I think you are on right track. I would segment the bill in to headers, total amount, line items and get an image classifier trained on it. Then you could use it to clean/extract relevant information that you need from the text