How to extract the title of a PDF document from within a script for renaming?

前端未结

关注

 6  1654

长情又很酷 2021-02-01 21:34

I have thousands of PDF files in my computers which names are from a0001.pdf to a3621.pdf, and inside of each there is a title; e.g. \"aluminum carbona

6条回答

时光说笑 (楼主)

2021-02-01 22:06
Building on Ciprian Tomoiagă's suggestion of using pdfrw, I've uploaded a script which also:
- renames files in sub-directories
- adds a command-line interface
- handles when file name already exists by appending a random string
- strips any character which is not alphanumeric from the new file name
- replaces non-ASCII characters (such as á è í ò ç...) for ASCII (a e i o c) in the new file name
- allows you to set the root dir and limit the length of the new file name from command-line
- show a progress bar and, after the script has finished, show some statistics
- does some error handling
As TextGeek mentioned, unfortunately not all files have the title metadata, so some files won't be renamed.

Repository: https://github.com/favict/pdf_renamefy

Usage:

After downloading the files, install the dependencies by running pip:
```
$pip install -r requirements.txt
```
and then to run the script:
```
$python -m renamefy  
```
...in which directory is the full path you would like to look for PDF files, and filename maximum length is the length at which the filename will be truncated in case the title is too long or was incorrectly set in the file.

Both parameters are optional. If none is provided, the directory is set to the current directory and filename maximum length is set to 120 characters.

Example:
```
$python -m renamefy C:\Users\John\Downloads 120
```
I used it on Windows, but it should work on Linux too.

Feel free to copy, fork and edit as you see fit.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

How to extract the title of a PDF document from within a script for renaming?

Usage: