Flatten FDF / XFDF forms to PDF in PHP with utf-8 characters

老子叫甜甜 提交于 2019-11-28 10:59:06

Unfortunately, UTF-8 character encoding does not work neither with decimal nor hexadecimal references of non-ASCII characters in source .xfdf file. PDFTK v. 1.44.

I found by using Jon's template but using the DomDocument the numeric encoding was handled for me and worked well. My slight variation is below:

$xml = new DOMDocument( '1.0', 'UTF-8' );

$rootNode = $xml->createElement( 'xfdf' );
$rootNode->setAttribute( 'xmlns', 'http://ns.adobe.com/xfdf/' );
$rootNode->setAttribute( 'xml:space', 'preserve' );
$xml->appendChild( $rootNode );

$fieldsNode = $xml->createElement( 'fields' );
$rootNode->appendChild( $fieldsNode );

foreach ( $fields as $field => $value )
{
    $fieldNode = $xml->createElement( 'field' );
    $fieldNode->setAttribute( 'name', $field );
    $fieldsNode->appendChild( $fieldNode );

    $valueNode = $xml->createElement( 'value' );
    $valueNode->appendChild( $xml->createTextNode( $value ) );
    $fieldNode->appendChild( $valueNode );
}

$xml->save( $file );

You could try the trial version of http://www.adobe.com/products/livecycle/designer/ and see what PDF files it generates.

Another commercial software you could try is http://www.appligent.com/fdfmerge. See page 16 in http://146.145.110.1/docs/userguide/FDFMergeUserGuide.pdf for how it handles xFDF with UTF-8.

I also had a look at the FDF specification http://partners.adobe.com/public/developer/en/xml/xfdf_2.0.pdf On page 12 it states:

Although XFDF is encoded in UTF-8, double byte characters are encoded as character references when 
exported from Acrobat. 
For example, the Japanese double byte characters ,  , and  are exported to XFDF using 
three character references. Here is an example of double byte characters in a form field: 
  ...
<fields>  
  <field name="Text1"> 
     <value>Here are 3 UTF-8 double byte  
        characters: &#x3042;&#x3044;&#x3046;
</value>  
  </field>  
</fields> ... 

I looked through pdftk-1.44-dist/java/com/lowagie/text/pdf/XfdfReader.java. It doesn't seem to do anything special with the input.

Maybe pdftk will do what you want, when you encode the weird characters as character references in your xFDF input.

Using the pdftk 1.44 on a Win7 machine I encounter the same problems with xfdf-files whereas fdf works fine. I made a xfdf-file without any special characters (only ANSI) but pdftk crashed again. I mailed the developper. Unfortunately no answer until now.

I made some progress on this. Starting with code from http://koivi.com/fill-pdf-form-fields/, I modified the value encoding to output numeric codes for any characters outside the ascii range.

Now with pitulski's special strings:

Poznań Śródmieście Ćwiartka Ósma outputs Pozna ródmiecie wiartka Ósma with some box shapes superimposed

ęóąśłżźćńĘÓĄŚŁŻŹĆŃ outputs óÓ with more box shapes. I think it may be that the box shapes are characters my server doesn't recognize.

I tried it with some French characters: ùûüÿ€’“”«»àâæçéèêëïôœÙÛÜŸÀÂÆÇÉÈÊËÏÎÔ and they all came out OK, but some of them were overlapping.

--edit-- I just tried entering these manually into the form and got the same result minus the box shapes (using Evince). I then tried with a different form (created by someone else) - after entering ęóąśłżźćńĘÓĄŚŁŻŹĆŃ, ółÓŁ was displayed. It looks like it depends which characters are included in the document's embedded fonts.

/*
KOIVI HTML Form to FDF Parser for PHP (C) 2004 Justin Koivisto
Version 1.2.?
Last Modified: 2013/01/17 - Jon Hulka(jon dot hulka at gmail dot com)
  - changed character encoding, all non-ascii characters get encoded as numeric character references

    This library is free software; you can redistribute it and/or modify it
    under the terms of the GNU Lesser General Public License as published by
    the Free Software Foundation; either version 2.1 of the License, or (at
    your option) any later version.

    This library is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
    or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
    License for more details.

    You should have received a copy of the GNU Lesser General Public License
    along with this library; if not, write to the Free Software Foundation,
    Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 

    Full license agreement notice can be found in the LICENSE file contained
    within this distribution package.

    Justin Koivisto
    justin dot koivisto at gmail dot com
    http://koivi.com
*/

/**
 * createXFDF
 * 
 * Tales values passed via associative array and generates XFDF file format
 * with that data for the pdf address sullpiled.
 * 
 * @param string $file The pdf file - url or file path accepted
 * @param array $info data to use in key/value pairs no more than 2 dimensions
 * @param string $enc default UTF-8, match server output: default_charset in php.ini
 * @return string The XFDF data for acrobat reader to use in the pdf form file
 */
function createXFDF($file,$info,$enc='UTF-8'){
    $data=
'<?xml version="1.0" encoding="'.$enc.'"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
    <fields>';
    foreach($info as $field => $val){
        $data.='
        <field name="'.$field.'">';
        if(is_array($val)){
            foreach($val as $opt)
//2013.01.17 - Jon Hulka - all non-ascii characters get character references
            $data.='
            <value>'.mb_encode_numericentity(htmlspecialchars($opt),array(0x0080, 0xffff, 0, 0xffff), 'UTF-8').'</value>';
//                $data.='<value>'.htmlentities($opt,ENT_COMPAT,$enc).'</value>'."\n";
        }else{
            $data.='
            <value>'.mb_encode_numericentity(htmlspecialchars($val),array(0x0080, 0xffff, 0, 0xffff), 'UTF-8').'</value>';
//            $data.='<value>'.htmlentities($val,ENT_COMPAT,$enc).'</value>'."\n";
        }
        $data.='
        </field>';
    }
    $data.='
    </fields>
    <ids original="'.md5($file).'" modified="'.time().'" />
    <f href="'.$file.'" />
</xfdf>';
    return $data;
}

What PDFTK's version? I tried the same thing with Polish characters (utf-8).

Does not work for me.

pdftk.exe, libiconv2.dll from: http://www.pdflabs.com/docs/install-pdftk/

Windows 7, cmd, file.pdf + file.fdf -> new.pdf

pdftk file.pdf fill_form file.xfdf output new.pdf flatten

Unhandled Java Exception:
java.lang.NoClassDefFoundError: gnu.gcj.convert.Input_UTF8 not found in [file:.\, core:/]
   at 0x005a3abe (Unknown Source)
   at 0x005a3fb2 (Unknown Source)
   at 0x006119f4 (Unknown Source)
   at 0x00649ee4 (Unknown Source)
   at 0x005b4c44 (Unknown Source)
   at 0x005470a9 (Unknown Source)
   at 0x00549c52 (Unknown Source)
   at 0x0059d348 (Unknown Source)
   at 0x007323c9 (Unknown Source)
   at 0x0054715a (Unknown Source)
   at 0x00562349 (Unknown Source)

But, with FDF file, with the same content, it worked properly. But the characters in new.PDF are bad.

pdftk file.pdf fill_form file.fdf output new.pdf flatten

---FDF---

%FDF-1.2
%âãÏÓ
1 0 obj<</FDF<</F(file.pdf)
/Fields[
<</T(Miejsce)/V(666 Poznań Śródmieście Ćwiartka Ósma)>>
<</T(Nr)/V(ęóąśłżźćńĘÓĄŚŁŻŹĆŃ)>>
]>>>>
endobj
trailer
<</Root 1 0 R>>
%%EOF

---XFDF---

<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="file.pdf"/>
<fields>
<field name="Miejsce">
<value>666 Poznań Śródmieście Ćwiartka Ósma</value>
</field>
<field name="Nr">
<value>ęóąśłżźćńĘÓĄŚŁŻŹĆŃ</value>
</field>
</fields>
</xfdf>

---PDF---

Miejsce: 666 PoznaÅ— ÅıródmieÅłcie ăwiartka Ãfisma
Nr: ÄŽÃ³Ä–ÅłÅ‡Å¼ÅºÄ⁄Å—ÄŸÃfiÄ—ÅıņŻŹăÅ

You can introduce utf-8 characters by giving their unicode code in octal with \ddd

To solve this, I wrote PdfFormFillerUTF-8: http://sourceforge.net/projects/pdfformfiller2/

There is a drop-in replacement for pdftk tool

Mcpdf: https://github.com/m-click/mcpdf

that solves unicode issues when filling forms. Works for me with CP1250 characters (Central Europe).

From project page:

the following command fills in form data from DATA.xfdf into FORM.pdf and writes the result to RESULT.pdf. It also flattens the document to prevent further editing:

java -jar mcpdf.jar FORM.pdf fill_form - output - flatten < DATA.xfdf > RESULT.pdf

This corresponds exactly to the usual PDFtk command:

pdftk FORM.pdf fill_form - output - flatten < DATA.xfdf > RESULT.pdf

Note that you need to have JRE installed.

I have managed to make it work with pdftk by creating a xfdf file with utf-8 encoding.

it took several tried but what make it work as exepcted was to add 'need_appearances'

here is an example:

pdftk source.pdf fill_form data.xfdf output output.pdf need_appearances
user2829228

pdftk supports encoding in UTF-16BE. It's not that difficult to convert from UTF-8 to UTF-16BE.

See: Weird characters when filling PDF with PDFTk

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!