I have 55 000 image files (in both JPG and TIFF format) which are pictures from a book.
The structure of each page is this:
some text
you might want to try John' Resig's OCR and Neural Nets in Javascript