I\'m using org.apache.commons.httpclient.HttpClient
and need to setup response encoding (for some reason server returns incorrect encoding in Content-Type). My
I don't think there's a better answer using HttpClient
3.x APIs.
The HTTP 1.1 spec says clearly that a client "must" respect the character set specified in the response header, and use ISO-8859-1 if no character set is specified. The HttpClient
APIs are designed on the assumption that the programmer wants to conform to the HTTP specs. Obviously, you need to break the rules in the spec so that you can talk to the non-compliant server. Not withstanding, this is not a use-case that the API designers saw a need to support explicitly.
If you were using the HttpClient
4.x, you could write your own ResponseHandler
to convert the body into an HttpEntity
, ignoring the response message's notional character set.
Disclaimer: I'm not really knowing HttpClient, only reading the API.
I would use the execute method returning a HttpResponse, then .getEntity().getContent()
. This is a pure byte stream, so if you want to ignore the encoding told by the server, you can simply wrap your own InputStreamReader around it.
Okay, looks like I had the wrong version (obviously, there are too much HttpClient
classes out there).
But same as before, just located on other classes: the HttpMethod
has a getResponseBodyAsStream()
method, around which you can now wrap your own InputStreamReader. (Or get the whole array at once, if it is not too big, and convert it to String, as you wrote.)
I think trying to change the response and letting the HttpClient analyze it is not the right way here.
I suggest sending a message to the server administrator/webmaster about the wrong charset, though.
Greetings folks,
Jus in case someone finds this post googling for setting HttpClient to write in UTF-8.
This line of code should be handy...
response.setContentType("text/html; charset=UTF-8");
Best
A few notes:
Server serves data, so it's up to server to serve it in an appropriate format. So response encoding is set by server not client. However, client could suggest to server what format it would like via Accept and Accept-Charset:
Accept: text/plain
Accept-Charset: utf-8
However, http servers usually do not convert between formats.
If option 1. does not work, then you should look at the configuration of the server.
When String is sent as raw bytes (and it always is, because this is what networks transmit), there is always the encoding defined. Since server produces this raw bytes, it defines the encoding. So, you can not take raw bytes and use encoding of your choice to create a String. You must use encoding that was used when converted from String to bytes.