Reading PDF metadata in PHP

后端未结

关注

 6  1507

你的背包

I\'m trying to read metadata attached to arbitrary PDFs: title, author, subject, and keywords.

Is there a PHP library, preferably open-source, that can read PDF meta

相关标签:

6条回答

慢半拍i

2020-12-08 18:07
The Zend framework includes Zend_Pdf, which makes this really easy:
```
$pdf = Zend_Pdf::load($pdfPath);

echo $pdf->properties['Title'] . "\n";
echo $pdf->properties['Author'] . "\n";
```
Limitations: Works only on files without encryption smaller then 16MB.
0 讨论(0)
发布评论:

提交评论
- 加载中...
耶瑟儿～

2020-12-08 18:13
PDF Parser does exactly what you want and it's pretty straightforward to use:
```
$parser = new \Smalot\PdfParser\Parser();
$pdf    = $parser->parseFile('document.pdf');
$text   = $pdf->getDetails();
```
You can try it in the demo page.
0 讨论(0)
发布评论:

提交评论
- 加载中...

醉梦人生

2020-12-08 18:16

<?php 

    $sourcefile = "file path";
    $stringedPDF = file_get_contents($sourcefile, true);

    preg_match('/(?<=Title )\S(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))./', $stringedPDF, $title);
    echo $all = $title[0];

0 讨论(0)

孤城傲影

2020-12-08 18:21

Don't know about libraries, but a simple way to achieve the same result might be fopening the file and parsing everything that comes after the last "endstream".

Try to open a pdf on a text editor, a parser shouldn't take more than five lines.

0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-12-08 18:21
I was looking for the same thing today. And I came across a small PHP class over at http://de77.com/ that offers a quick and dirty solution. You can download the class directly. Output is UTF-8 encoded.

The creator says:

Here’s a PHP class I wrote which can be used to get title & author and a number of pages of any PDF file. It does not use any external application - just pure PHP.
```
// basic example
include 'PDFInfo.php';
$p = new PDFInfo;
$p->load('file.pdf');
echo $p->author;
echo $p->title;
echo $p->pages;
```
For me, it work's! All thanks goes solely to the creator of the class ... well, maybe just a little bit thanks to me too for finding the class ;)
0 讨论(0)
发布评论:

提交评论
- 加载中...

感动是毒

2020-12-08 18:22

You may use PDFtk to extract the page count:

// Windows
$bin = realpath('C:\\pdftk\\bin\\pdftk.exe');
$cmd = "cmd /c {$bin} {$path} dump_data | grep NumberOfPages | sed 's/[^0-9]*//'";

// Unix
$cmd = "pdftk {$path} dump_data | grep NumberOfPages | sed 's/[^0-9]*

If ImageMagick is available you may also use:

$cmd = "identify -format %n {$path}";

Execute in PHP via shell_exec():

$res = shell_exec($cmd);

0 讨论(0)