I\'m trying to read metadata attached to arbitrary PDFs: title, author, subject, and keywords.
Is there a PHP library, preferably open-source, that can read PDF meta
The Zend framework includes Zend_Pdf, which makes this really easy:
$pdf = Zend_Pdf::load($pdfPath);
echo $pdf->properties['Title'] . "\n";
echo $pdf->properties['Author'] . "\n";
Limitations: Works only on files without encryption smaller then 16MB.
PDF Parser does exactly what you want and it's pretty straightforward to use:
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.pdf');
$text = $pdf->getDetails();
You can try it in the demo page.
<?php
$sourcefile = "file path";
$stringedPDF = file_get_contents($sourcefile, true);
preg_match('/(?<=Title )\S(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))./', $stringedPDF, $title);
echo $all = $title[0];
Don't know about libraries, but a simple way to achieve the same result might be fopening the file and parsing everything that comes after the last "endstream".
Try to open a pdf on a text editor, a parser shouldn't take more than five lines.
I was looking for the same thing today. And I came across a small PHP class over at http://de77.com/ that offers a quick and dirty solution. You can download the class directly. Output is UTF-8 encoded.
The creator says:
Here’s a PHP class I wrote which can be used to get title & author and a number of pages of any PDF file. It does not use any external application - just pure PHP.
// basic example
include 'PDFInfo.php';
$p = new PDFInfo;
$p->load('file.pdf');
echo $p->author;
echo $p->title;
echo $p->pages;
For me, it work's! All thanks goes solely to the creator of the class ... well, maybe just a little bit thanks to me too for finding the class ;)
You may use PDFtk to extract the page count:
// Windows
$bin = realpath('C:\\pdftk\\bin\\pdftk.exe');
$cmd = "cmd /c {$bin} {$path} dump_data | grep NumberOfPages | sed 's/[^0-9]*//'";
// Unix
$cmd = "pdftk {$path} dump_data | grep NumberOfPages | sed 's/[^0-9]*
If ImageMagick is available you may also use:
$cmd = "identify -format %n {$path}";
Execute in PHP via shell_exec():
$res = shell_exec($cmd);