I have thousands of PDF files in my computers which names are from a0001.pdf
to a3621.pdf
, and inside of each there is a title; e.g. \"aluminum carbona
This cannot be solved with plain Python. You will need an external package such as pdfrw, which allows you to read PDF metadata. The installation is quite easy using the standard Python package manager pip
.
On Windows, first make sure you have a recent version of pip
using the shell command:
python -m pip install -U pip
On Linux:
pip install -U pip
On both platforms, install then the pdfrw
package using
pip install pdfrw
I combined the ansatzes of zeebonk and user2125722 to write something very compact and readable which is close to your original code:
import os
from pdfrw import PdfReader
path = r'C:\Users\YANN\Desktop'
def renameFileToPDFTitle(path, fileName):
fullName = os.path.join(path, fileName)
# Extract pdf title from pdf file
newName = PdfReader(fullName).Info.Title
# Remove surrounding brackets that some pdf titles have
newName = newName.strip('()') + '.pdf'
newFullName = os.path.join(path, newName)
os.rename(fullName, newFullName)
for fileName in os.listdir(path):
# Rename only pdf files
fullName = os.path.join(path, fileName)
if (not os.path.isfile(fullName) or fileName[-4:] != '.pdf'):
continue
renameFileToPDFTitle(path, fileName)