Error: cannot import name 'PDFDocument' from 'pdfminer.pdfparser'

我们两清 提交于 2019-12-10 23:38:57


I need to extract text from pdf-files and have used pdfminer.six with success, extracting both text paragraphs and tables. But now I get an error related to the line

from pdfminer.pdfparser import PDFParser, PDFDocument: 

ImportError: cannot import name 'PDFDocument' from 'pdfminer.pdfparser' (C:\Users[username]\Anaconda3\lib\site-packages\pdfminer\

I'm using Anaconda Jupyter. Python 3.7.3. Package pdfminer.six-20181108

The code I'm using is based on this: How to read pdf file using pdfminer3k?

Based on advice given below I've tried to uninstall and reinstall Anaconda and pdfminer.six and other packages several times: A week ago it suddenly worked, but now I get an error again.

Since I'm working on Win10 I also tried using Linux Ubuntu as described here:

Same error.

Then, based on the webpage below I thought it was worth a try to split PDFparser, PDFDocument: from

from pdfminer.pdfparser import PDFParser, PDFDocument


from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage .. But that created new errors later on in the code.

The start of my code looks like this:

path = [name and path of file]
fp = open(path, 'rb')
from pdfminer.pdfparser import PDFParser, PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LAParams, LTTextBox, LTTextLine

I expect to be able to run the code and extract the text from the pdf-file, but the code is stopped by the error relating to PDFDocument pdfminer.pdfparser

Any advice on what I should do is much appreciated! Might it has something to do with how pdfminer.six is installed?


I got help from Notodden Serit. Change this:

from pdfminer.pdfparser import PDFParser, PDFDocument


from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage

And add parser in

doc = PDFDocument()


doc = PDFDocument(parser)

And then:

for page in doc.get_pages():


for page in PDFPage.create_pages(doc):

