问题
i have a problem to access into websites whit utf8 charset, for example when i try to accesso at this www
Click for example
all utf8 characters are not correctly codified. This is my access routine:
var
Web : TIdHTTP;
Sito : String;
hIOHand : TIdSSLIOHandlerSocketOpenSSL;
begin
Url := TIdURI.URLEncode(Url);
try
Web := TIdHTTP.Create(nil);
hIOHand := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
hIOHand.DefStringEncoding := IndyTextEncoding_UTF8;
hIOHand.SSLOptions.SSLVersions := [sslvTLSv1,sslvTLSv1_1,sslvTLSv1_2,sslvSSLv2,sslvSSLv3,sslvSSLv23];
Web.IOHandler := hIOHand;
Web.Request.CharSet := 'utf-8';
Web.Request.UserAgent := INET_USERAGENT; //Custom user agent string
Web.RedirectMaximum := INET_REDIRECT_MAX; //Maximum redirects
Web.HandleRedirects := INET_REDIRECT_MAX <> 0; //Handle redirects
Web.ReadTimeOut := INET_TIMEOUT_SECS * 1000; //Read timeout msec
try
Sito := Web.Get(Url);
Web.Disconnect;
except
on e : exception do
Sito := 'ERR: ' +Url+#32+e.Message;
end;
finally
Web.Free;
hIOHand.Free;
end;
I try all solution but in the Sito var i find alltime wrong characthers, for example correct value of the "name" is
"name": "Aire d'adhésion du Parc national du Mercantour",
but after the Get instruction i have
"name": "Aire d'adhésion du Parc national du Mercantour",
Do you have idea where is my error? Thankyou all!
回答1:
In Delphi 2009+, which includes XE6, string
is a UTF-16 encoded UnicodeString
.
You are using the overloaded version of TIdHTTP.Get()
that returns a string
. It decodes the sent text to UTF-16 using whatever charset is reported by the response. If the text is not decoding properly, it likely means the response is not reporting a correct charset. If the wrong charset is used, the text will not decode properly.
The URL in question is, in fact, sending a response Content-Type
header that is set to application/json
without specifying a charset
at all. The default charset for application/json
is UTF-8, but Indy does not know that, so it ends up using its own internal default instead, which is not UTF-8. That is why the text is not decoding properly when non-ASCII characters are present.
In which case, if you KNOW the charset will always be UTF-8, you have a few workarounds to choose from:
you can set Indy's default charset to UTF-8 by setting the global
GIdDefaultTextEncoding
variable in theIdGlobal
unit:GIdDefaultTextEncoding := encUTF8;
you can use the
TIdHTTP.OnHeadersAvailable
event to change theTIdHTTP.Response.Charset
property to'utf-8'
if it is blank or incorrect.Web.OnHeadersAvailable := CheckResponseCharset; ... procedure TMyClass.CheckResponseCharset(Sender: TObject; AHeaders: TIdHeaderList; var VContinue: Boolean); var Response: TIdHTTPResponse; begin Response := TIdHTTP(Sender).Response; if IsHeaderMediaType(Response.ContentType, 'application/json') and (Response.Charset = '') then Response.Charset := 'utf-8'; VContinue := True; end;
you can use the other overloaded version of
TIdHTTP.Get()
that fills an outputTStream
instead of returning astring
. Using aTMemoryStream
orTStringStream
, you can decode the raw bytes yourself using UTF-8:MStrm := TMemoryStream.Create; try Web.Get(Url, MStrm); MStrm.Position := 0; Sito := ReadStringFromStream(MStrm, IndyTextEncoding_UTF8); finally SStrm.Free; end;
SStrm := TStringStream.Create('', TEncoding.UTF8); try Web.Get(Url, SStrm); Sito := SStrm.DataString; finally SStrm.Free; end;
来源:https://stackoverflow.com/questions/52800270/delphi-indy-utf8