Reading PDF metadata in PHP

后端 未结 6 1507
你的背包
你的背包 2020-12-08 17:55

I\'m trying to read metadata attached to arbitrary PDFs: title, author, subject, and keywords.

Is there a PHP library, preferably open-source, that can read PDF meta

相关标签:
6条回答
  • 2020-12-08 18:07

    The Zend framework includes Zend_Pdf, which makes this really easy:

    $pdf = Zend_Pdf::load($pdfPath);
    
    echo $pdf->properties['Title'] . "\n";
    echo $pdf->properties['Author'] . "\n";
    

    Limitations: Works only on files without encryption smaller then 16MB.

    0 讨论(0)
  • 2020-12-08 18:13

    PDF Parser does exactly what you want and it's pretty straightforward to use:

    $parser = new \Smalot\PdfParser\Parser();
    $pdf    = $parser->parseFile('document.pdf');
    $text   = $pdf->getDetails();
    

    You can try it in the demo page.

    0 讨论(0)
  • 2020-12-08 18:16
    <?php 
    
        $sourcefile = "file path";
        $stringedPDF = file_get_contents($sourcefile, true);
    
        preg_match('/(?<=Title )\S(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))./', $stringedPDF, $title);
        echo $all = $title[0];
    
    0 讨论(0)
  • 2020-12-08 18:21

    Don't know about libraries, but a simple way to achieve the same result might be fopening the file and parsing everything that comes after the last "endstream".

    Try to open a pdf on a text editor, a parser shouldn't take more than five lines.

    0 讨论(0)
  • 2020-12-08 18:21

    I was looking for the same thing today. And I came across a small PHP class over at http://de77.com/ that offers a quick and dirty solution. You can download the class directly. Output is UTF-8 encoded.

    The creator says:

    Here’s a PHP class I wrote which can be used to get title & author and a number of pages of any PDF file. It does not use any external application - just pure PHP.

    // basic example
    include 'PDFInfo.php';
    $p = new PDFInfo;
    $p->load('file.pdf');
    echo $p->author;
    echo $p->title;
    echo $p->pages;
    

    For me, it work's! All thanks goes solely to the creator of the class ... well, maybe just a little bit thanks to me too for finding the class ;)

    0 讨论(0)
  • 2020-12-08 18:22

    You may use PDFtk to extract the page count:

    // Windows
    $bin = realpath('C:\\pdftk\\bin\\pdftk.exe');
    $cmd = "cmd /c {$bin} {$path} dump_data | grep NumberOfPages | sed 's/[^0-9]*//'";
    
    // Unix
    $cmd = "pdftk {$path} dump_data | grep NumberOfPages | sed 's/[^0-9]*
    

    If ImageMagick is available you may also use:

    $cmd = "identify -format %n {$path}";
    

    Execute in PHP via shell_exec():

    $res = shell_exec($cmd);
    
    0 讨论(0)
提交回复
热议问题