问题
My scenario:
- A PDF template with formfields: template.pdf
- An XFDF file that contains the data to be filled in: fieldData.xfdf
Now I need to have these to files combined & flattened. pdftk does the job easily within php:
exec("pdftk template.pdf fill_form fieldData.xfdf output flatFile.pdf flatten");
Unfortunately this does not work with full utf-8 support. For example: Cyrillic and greek letters get scrambled. I used Arial for this, with an unicode character set.
- How can I accomplish to flatten my unicode files?
- Is there any other pdf tool that offers unicode support?
- Does pdftk have an unicode switch that I am missing?
EDIT 1: As this question has not been solved for more then 9 month, I decided to start a bounty for it. In case there are options to sponsor a feature or a bugfix in pdftk, I'd be glad to donate.
EDIT 2: I am not working on this project anymore, so I cannot verify new answers. If anyone has a similar problem, I am glad if they can respond in my favour.
回答1:
Unfortunately, UTF-8 character encoding does not work neither with decimal nor hexadecimal references of non-ASCII characters in source .xfdf file. PDFTK v. 1.44.
回答2:
I found by using Jon's template but using the DomDocument the numeric encoding was handled for me and worked well. My slight variation is below:
$xml = new DOMDocument( '1.0', 'UTF-8' );
$rootNode = $xml->createElement( 'xfdf' );
$rootNode->setAttribute( 'xmlns', 'http://ns.adobe.com/xfdf/' );
$rootNode->setAttribute( 'xml:space', 'preserve' );
$xml->appendChild( $rootNode );
$fieldsNode = $xml->createElement( 'fields' );
$rootNode->appendChild( $fieldsNode );
foreach ( $fields as $field => $value )
{
$fieldNode = $xml->createElement( 'field' );
$fieldNode->setAttribute( 'name', $field );
$fieldsNode->appendChild( $fieldNode );
$valueNode = $xml->createElement( 'value' );
$valueNode->appendChild( $xml->createTextNode( $value ) );
$fieldNode->appendChild( $valueNode );
}
$xml->save( $file );
回答3:
You could try the trial version of http://www.adobe.com/products/livecycle/designer/ and see what PDF files it generates.
Another commercial software you could try is http://www.appligent.com/fdfmerge. See page 16 in http://146.145.110.1/docs/userguide/FDFMergeUserGuide.pdf for how it handles xFDF with UTF-8.
I also had a look at the FDF specification http://partners.adobe.com/public/developer/en/xml/xfdf_2.0.pdf On page 12 it states:
Although XFDF is encoded in UTF-8, double byte characters are encoded as character references when
exported from Acrobat.
For example, the Japanese double byte characters , , and are exported to XFDF using
three character references. Here is an example of double byte characters in a form field:
...
<fields>
<field name="Text1">
<value>Here are 3 UTF-8 double byte
characters: あいう
</value>
</field>
</fields> ...
I looked through pdftk-1.44-dist/java/com/lowagie/text/pdf/XfdfReader.java. It doesn't seem to do anything special with the input.
Maybe pdftk will do what you want, when you encode the weird characters as character references in your xFDF input.
回答4:
Using the pdftk 1.44 on a Win7 machine I encounter the same problems with xfdf-files whereas fdf works fine. I made a xfdf-file without any special characters (only ANSI) but pdftk crashed again. I mailed the developper. Unfortunately no answer until now.
回答5:
I made some progress on this. Starting with code from http://koivi.com/fill-pdf-form-fields/, I modified the value encoding to output numeric codes for any characters outside the ascii range.
Now with pitulski's special strings:
Poznań Śródmieście Ćwiartka Ósma
outputs Pozna ródmiecie wiartka Ósma
with some box shapes superimposed
ęóąśłżźćńĘÓĄŚŁŻŹĆŃ
outputs óÓ
with more box shapes. I think it may be that the box shapes are characters my server doesn't recognize.
I tried it with some French characters: ùûüÿ€’“”«»àâæçéèêëïôœÙÛÜŸÀÂÆÇÉÈÊËÏÎÔ
and they all came out OK, but some of them were overlapping.
--edit-- I just tried entering these manually into the form and got the same result minus the box shapes (using Evince). I then tried with a different form (created by someone else) - after entering ęóąśłżźćńĘÓĄŚŁŻŹĆŃ
, ółÓŁ
was displayed. It looks like it depends which characters are included in the document's embedded fonts.
/*
KOIVI HTML Form to FDF Parser for PHP (C) 2004 Justin Koivisto
Version 1.2.?
Last Modified: 2013/01/17 - Jon Hulka(jon dot hulka at gmail dot com)
- changed character encoding, all non-ascii characters get encoded as numeric character references
This library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at
your option) any later version.
This library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this library; if not, write to the Free Software Foundation,
Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Full license agreement notice can be found in the LICENSE file contained
within this distribution package.
Justin Koivisto
justin dot koivisto at gmail dot com
http://koivi.com
*/
/**
* createXFDF
*
* Tales values passed via associative array and generates XFDF file format
* with that data for the pdf address sullpiled.
*
* @param string $file The pdf file - url or file path accepted
* @param array $info data to use in key/value pairs no more than 2 dimensions
* @param string $enc default UTF-8, match server output: default_charset in php.ini
* @return string The XFDF data for acrobat reader to use in the pdf form file
*/
function createXFDF($file,$info,$enc='UTF-8'){
$data=
'<?xml version="1.0" encoding="'.$enc.'"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<fields>';
foreach($info as $field => $val){
$data.='
<field name="'.$field.'">';
if(is_array($val)){
foreach($val as $opt)
//2013.01.17 - Jon Hulka - all non-ascii characters get character references
$data.='
<value>'.mb_encode_numericentity(htmlspecialchars($opt),array(0x0080, 0xffff, 0, 0xffff), 'UTF-8').'</value>';
// $data.='<value>'.htmlentities($opt,ENT_COMPAT,$enc).'</value>'."\n";
}else{
$data.='
<value>'.mb_encode_numericentity(htmlspecialchars($val),array(0x0080, 0xffff, 0, 0xffff), 'UTF-8').'</value>';
// $data.='<value>'.htmlentities($val,ENT_COMPAT,$enc).'</value>'."\n";
}
$data.='
</field>';
}
$data.='
</fields>
<ids original="'.md5($file).'" modified="'.time().'" />
<f href="'.$file.'" />
</xfdf>';
return $data;
}
回答6:
What PDFTK's version? I tried the same thing with Polish characters (utf-8).
Does not work for me.
pdftk.exe, libiconv2.dll from: http://www.pdflabs.com/docs/install-pdftk/
Windows 7, cmd, file.pdf + file.fdf -> new.pdf
pdftk file.pdf fill_form file.xfdf output new.pdf flatten
Unhandled Java Exception:
java.lang.NoClassDefFoundError: gnu.gcj.convert.Input_UTF8 not found in [file:.\, core:/]
at 0x005a3abe (Unknown Source)
at 0x005a3fb2 (Unknown Source)
at 0x006119f4 (Unknown Source)
at 0x00649ee4 (Unknown Source)
at 0x005b4c44 (Unknown Source)
at 0x005470a9 (Unknown Source)
at 0x00549c52 (Unknown Source)
at 0x0059d348 (Unknown Source)
at 0x007323c9 (Unknown Source)
at 0x0054715a (Unknown Source)
at 0x00562349 (Unknown Source)
But, with FDF file, with the same content, it worked properly. But the characters in new.PDF are bad.
pdftk file.pdf fill_form file.fdf output new.pdf flatten
---FDF---
%FDF-1.2
%âãÏÓ
1 0 obj<</FDF<</F(file.pdf)
/Fields[
<</T(Miejsce)/V(666 Poznań Śródmieście Ćwiartka Ósma)>>
<</T(Nr)/V(ęóąśłżźćńĘÓĄŚŁŻŹĆŃ)>>
]>>>>
endobj
trailer
<</Root 1 0 R>>
%%EOF
---XFDF---
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="file.pdf"/>
<fields>
<field name="Miejsce">
<value>666 Poznań Śródmieście Ćwiartka Ósma</value>
</field>
<field name="Nr">
<value>ęóąśłżźćńĘÓĄŚŁŻŹĆŃ</value>
</field>
</fields>
</xfdf>
---PDF---
Miejsce: 666 PoznaÅ— ÅıródmieÅłcie ăwiartka Ãfisma
Nr: ÄŽÃ³Ä–ÅłÅ‡Å¼ÅºÄ⁄Å—ÄŸÃfiÄ—ÅıņŻŹăÅ
回答7:
You can introduce utf-8 characters by giving their unicode code in octal with \ddd
回答8:
To solve this, I wrote PdfFormFillerUTF-8: http://sourceforge.net/projects/pdfformfiller2/
回答9:
There is a drop-in replacement for pdftk tool
Mcpdf: https://github.com/m-click/mcpdf
that solves unicode issues when filling forms. Works for me with CP1250 characters (Central Europe).
From project page:
the following command fills in form data from DATA.xfdf into FORM.pdf and writes the result to RESULT.pdf. It also flattens the document to prevent further editing:
java -jar mcpdf.jar FORM.pdf fill_form - output - flatten < DATA.xfdf > RESULT.pdf
This corresponds exactly to the usual PDFtk command:
pdftk FORM.pdf fill_form - output - flatten < DATA.xfdf > RESULT.pdf
Note that you need to have JRE installed.
回答10:
I have managed to make it work with pdftk by creating a xfdf file with utf-8 encoding.
it took several tried but what make it work as exepcted was to add 'need_appearances'
here is an example:
pdftk source.pdf fill_form data.xfdf output output.pdf need_appearances
回答11:
pdftk supports encoding in UTF-16BE. It's not that difficult to convert from UTF-8 to UTF-16BE.
See: Weird characters when filling PDF with PDFTk
来源:https://stackoverflow.com/questions/3973283/flatten-fdf-xfdf-forms-to-pdf-in-php-with-utf-8-characters