Python text extraction does not work on some pdfs
I am trying to read a pdf through url. I followed many stackoverflow suggestions and used PyPdf2 FileReader to extract text from the pdf. My code looks like this : url = "http://kat.kar.nic.in:8080/uploadedFiles/C_13052015_ch1_l1.pdf" #url = "http://kat.kar.nic.in:8080/uploadedFiles/C_06052015_ch1_l1.pdf" f = urlopen(Request(url)).read() fileInput = StringIO(f) pdf = PyPDF2.PdfFileReader(fileInput) print pdf.getNumPages() print pdf.getDocumentInfo() print pdf.getPage(1).extractText() I am able to successfully extract text for first link. But if I use the same program for the second pdf. I do