When I create a PDF form (for instance using Acrobat) that contains text fields in AcroForm format (PDF dictionaries, no XFA), and I submit the data to a server, how can I s
I've just found the answer to my main question myself. I didn't find anything in ISO-32000-1 or the ISO-32000-2 draft, but studying the Acrobat JavaScript reference, I found the cCharset
parameter that is available for the submitForm()
method. That parameter defines:
The encoding for the values submitted. String values are utf-8, utf-16, Shift-JIS, BigFive, GBK, and UHC. If not passed, the current Acrobat behavior applies. For XML-based formats, utf-8 is used. For other formats, Acrobat tries to find the best host encoding for the values being submitted. XFDF submission ignores this value and always uses utf-8.
In other words: in my case GBK was used because it fits best to submit Chinese characters. However, one could force UTF-8 by using the submitForm()
JavaScript method using the appropriate value.
Based on this question, I have asked the ISO committee to fix this problem in ISO-32000-2. As a result, an extra possible entry was added to the table entitled Additional entries specific to a submit-form action in section 12.7.6.2:
CharSet: string
(Optional; inheritable) Possible values include: utf-8, utf-16, Shift-JIS, BigFive, GBK, or UHC.
Starting with PDF 2.0, this problem will no longer exist.
Update: my suggestion made ISO 32000-2 (aka PDF 2.0):
The CharSet key doesn't exist in ISO 32000-1; it was introduced in ISO 32000-2.