How can I get the total count of total pages of a pdf using pdfminer in python

后端未结

关注

 4  465

旧巷少年郎

In PyPDF2 pdfreader.getNumPages() gives me the total number of pages of a pdf file.

How can I get this using pdfminer?

相关标签:

4条回答

死守一世寂寞

2021-02-06 17:31
Using pdfminer.six you just need to import the high level function extract_pages, convert the generator into a list and take its lenght.
```
from pdfminer.high_level import extract_pages

print(len(list(extract_pages(pdf_file))))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2021-02-06 17:36
Using pdfminer,import the necessary modules.
```
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
```
Create a PDF parser object associated with the file object.
```
fp = open('your_file.pdf', 'rb')
parser = PDFParser(fp)
```
Create a PDF document object that stores the document structure.
```
document = PDFDocument(parser)
```
Iterate through the create_pages() function incrementing each time there is a page.
```
num_pages = 0
for page in PDFPage.create_pages(document):
    num_pages += 1
print(num_pages)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
我寻月下人不归

2021-02-06 17:37
I hate to just leave a code snippet. For context here is a link to the current pdfminer.six repo where you might be able to learn a little more about the resolve1 method.

As you're working with pdfminer you might print and come across some PDFObjRef objects. Essentially you can use resolve1 to expand those objects (they're usually a dictionary).
```
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfinterp import resolve1

file = open('some_file.pdf', 'rb')
parser = PDFParser(file)
document = PDFDocument(parser)

# This will give you the count of pages
print(resolve1(document.catalog['Pages'])['Count'])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2021-02-06 17:42
I found pdfminer very slow in getting total number of pages. Found this a cleaner and faster solution:

pip3 install PyPDF2
```
from PyPDF2 import PdfFileReader
def get_pdf_page_count(path):
  with open(path, 'rb') as fl:
    reader = PdfFileReader(fl)
    return reader.getNumPages()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...