File upload fails, when posting with Indy and filename contains Greek characters

前端未结

关注

 1  693

I am trying to implement a POST to a web service. I need to send a file whose type is variable (.docx, .pdf, .txt) along

相关标签:

1条回答

[愿得一人]

2021-01-14 21:06
EncodeHeader() does have some known issues with Unicode strings:

EncodeHeader() needs to take codeunits into account when splitting data between adjacent encoded-words

Basically, an MIME-encoded word cannot be more than 75 characters in length, so long text gets split up. But when encoding a long Unicode string, any given Unicode character may be charset-encoded using 1 or more bytes, and EncodeHeader() does not yet avoid erroneously splitting a multi-byte character between two individual bytes into separate encoded words (which is illegal and explicitly forbidden by RFC 2047 of the MIME spec).

However, that is not what is happening in your examples.

In your first example, 'Επιστολή εκπαιδευτικο.docx' is too long to be encoded as a single MIME word, so it gets split into 'Επιστολή εκπαιδευτικο.doc' 'x' substrings, which are then encoded separately. This is legal in MIME for long text (though you might have expected Indy to split the text into 'Επιστολή' ' εκπαιδευτικο.doc' instead, or even 'Επιστολή' ' εκπαιδευτικο' '.doc'. That might be a possibility in a future release). Adjacent MIME words that are separated by only whitespace are meant to be concatenated together without separating whitespace when decoded, thus producing 'Επιστολή εκπαιδευτικο.docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικο.doc x' instead?).

In your second example, 'Επιστολή εκπαιδευτικ.docx' is short enough to be encoded as a single MIME word.

In your third example, 'Επιστολή εκπαιδευτικ .docx' gets split on the second whitespace (not the first) into 'Επιστολή εκπαιδευτικ' ' .docx' substrings, and only the first substring needs to be encoded. This is legal in MIME. When decoded, the decoded text is meant to be concatenated with the following unencoded text, preserving whitespace between them, thus producing 'Επιστολή εκπαιδευτικ .docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικ.docx' instead?).

If you run these example filenames through Indy's MIME header encoder/decoder, they do decode properly:
```
var
  s: String;
begin
  s := EncodeHeader('Επιστολή εκπαιδευτικο.docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#13#10' =?UTF-8?B?eA==?='
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικο.docx'

  s := EncodeHeader('Επιστολή εκπαιδευτικ.docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?='
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικ.docx' 

  s := EncodeHeader('Επιστολή εκπαιδευτικ .docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx' 
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικ .docx'
end;
```
So the problem seems to be on the server side decoding, not on Indy's client side encoding.

That being said, if you are using a fairly recent version of Indy 10 (Nov 2011 or later), TIdFormDataField has a HeaderEncoding property, which defaults to 'B' (base64) in Unicode environments. However, the splitting logic also affects 'Q' (quoted-printable) as well, so that may or may not work for you, either (but you can try it):
```
with Params.AddFile('File', ceFileName.Text, '') do
begin
  ContentTransfer := '';
  HeaderEncoding := 'Q'; // <--- here
  HeaderCharSet := 'utf-8';
end;
```
Otherwise, a workaround might be to change the value to '8' (8-bit) instead, which effectively disables MIME encoding (but not charset encoding):
```
with Params.AddFile('File', ceFileName.Text, '') do
begin
  ContentTransfer := '';
  HeaderEncoding := '8'; // <--- here
  HeaderCharSet := 'utf-8';
end;
```
Just note that if the server is not expecting raw UTF-8 bytes for the filename, you might still run into problems (ie, 'Επιστολή εκπαιδευτικο.docx' being interpreted as 'Î•Ï€Î¹ÏƒÏ„Î¿Î»Î® ÎµÎºÏ€Î±Î¹Î´ÎµÏ…Ï„Î¹ÎºÎ¿.docx', for instance).
0 讨论(0)
发布评论:

提交评论
- 加载中...