How to extract the title of a PDF document from within a script for renaming?

前端未结

关注

 6  1661

长情又很酷 2021-02-01 21:34

I have thousands of PDF files in my computers which names are from a0001.pdf to a3621.pdf, and inside of each there is a title; e.g. \"aluminum carbona

6条回答

挽巷 (楼主)

2021-02-01 21:53

Installing the package

This cannot be solved with plain Python. You will need an external package such as pdfrw, which allows you to read PDF metadata. The installation is quite easy using the standard Python package manager pip.

On Windows, first make sure you have a recent version of pip using the shell command:

python -m pip install -U pip

On Linux:

pip install -U pip

On both platforms, install then the pdfrw package using

pip install pdfrw

The code

I combined the ansatzes of zeebonk and user2125722 to write something very compact and readable which is close to your original code:

import os
from pdfrw import PdfReader

path = r'C:\Users\YANN\Desktop'


def renameFileToPDFTitle(path, fileName):
    fullName = os.path.join(path, fileName)
    # Extract pdf title from pdf file
    newName = PdfReader(fullName).Info.Title
    # Remove surrounding brackets that some pdf titles have
    newName = newName.strip('()') + '.pdf'
    newFullName = os.path.join(path, newName)
    os.rename(fullName, newFullName)


for fileName in os.listdir(path):
    # Rename only pdf files
    fullName = os.path.join(path, fileName)
    if (not os.path.isfile(fullName) or fileName[-4:] != '.pdf'):
        continue
    renameFileToPDFTitle(path, fileName)

0 讨论(0)

查看其它6个回答