I am creating a database from historical records which I have as photographed pages from books (+100K pages). I wrote some python code to do some image processing before I OCR e
"align the text in the image" I suppose means to deskew the image so that text lines have the same baseline.
I thoroughly enjoyed reading scientific answers to this quite overengineered task. Answers are great, but is it really necessary to spend so much time (very precious resource) to implement this? There is an abundance of tools available for this function without needing to write a single line of code (unless OP is a CS student and wants to practice the science, but obviously OP is doing this out of necessity to get all images processed). These methods took me back to my college years, but today I would use different tools to process this batch quickly and efficiently, which I do daily. I work for a high-volume document conversion and data extraction service bureau and OCR consulting company.
Here is the result of a basic open and deskew step in ABBYY FineReader commercial desktop OCR package. Deskewing was more than sufficient for further OCR processing.
And I did not need to recreate and program my own browser just to post this answer.