How read PDF file and put content into string? Using PHP language.
You could use something like pdftotext which comes with the Xpdf package on linux. The popen command can then be used to pipe the output of pdftotext into a string:
$mystring = "";
$fd = popen("/usr/bin/pdftotext blah.pdf","r");
if ($fd) {
while (($myline = fgets($fd)) !== false) {
$mystring .= $myline;
}
}
Found this really nice class! Further, you can add functionality to fit your needs.
Probably these will help you to add functionality:
If it doesn't work, check if you can highlight/mark your text when opening in Adobe Reader (if you can't, the text in your file is probably saved as geometric curves), check also for the encoding.
Install APACHE-TIKA on your server. APACHE-TIKA support more then pdf files. Install guide: http://www.acquia.com/blog/use-apache-solr-search-files
and final code is easy:
$string = "";
$fd = popen("java -jar yourpathtotika/tika-app-1.3.jar -t yourpathtopdf/sample.pdf","r");
while (!feof($fd)) {
$buffer = fgets($fd, 4096);
$string .= $buffer;
}
echo $string;
You can use the PHP class that is available here :
This is a public domain PDF text extractor entirely written in pure PHP, meaning that you do not need to rely on external commands. It provides a simple interface to retrieve text :
include ( 'PdfToText.phpclass' ) ;
$pdf = new PdfToText ( 'mysample.pdf' ) ;
echo "PDF contents are : " . $pdf -> Text . "\n" ;
来源:https://stackoverflow.com/questions/4780697/converting-pdf-to-string