Take a look at this library: https://pypi.python.org/pypi/pypdfocr
but a PDF file can have also images in it. You may be able to analyse the page content streams. Some scanners break up the single scanned page into images, so you won't get the text with ghostscript.