Extracting text from PDF page's certain areas?
问题 I am trying to parse a PDF-book, but I only need the main body WITHOUT footers, headers or footnotes. I looked through pdfminer documentation but I haven't succeeded yet. Here is the code I use for getting the text: from pdfminer.converter import TextConverter from pdfminer.pdfinterp import PDFPageInterpreter from pdfminer.pdfinterp import PDFResourceManager from pdfminer.pdfpage import PDFPage with open(pdfname, 'rb') as fh: for page in PDFPage.get_pages(fh, caching=True, check_extractable