I am wandering if anybody has a reliable way of determine whether a PDF document is actually a PDF document, and that it isn\'t corrupted.
I generate reports on my syste
If you just want to make sure the file is a PDF file, without checking that it is a completely intact pdf file with no issues, you can read the first 5 bytes of the file and for a PDF file they will be exactly equal to the string "%PDF-"
This is how the file
program in linux identifies PDF files.
But if you want to make absolutely sure there are no errors anywhere in the file, you can run a program that processes the entire file, and see if that program returns success.
In linux you can use ghostscript ("gs") to render the PDF document to any format.
Or you can install acrobat reader, and run acroread
as a command line program to convert it to postscript:
acroread -print -toPostScript [your_file.pdf]
To do either of these you will need to use the system PHP function. To check of the program ran successfully, you need to pass a variable in the second parameter to system
that will receive the return status.
You can use pdfinfo
, centos
installation command:
yum install poppler-utils
... and use pdfinfo
command. The PHP code is as follows:
if(!exec("pdfinfo test.pdf")){
echo "file is corrupted"
}