I am having weird character encoding issues with a JSON array that is grabbed from a web page. The server is sending back this header:
Content-Type text/javascri
Extract the charset from the response content type field. You can use the following method to do this:
private static String extractCharsetFromContentType(String contentType) {
if (TextUtils.isEmpty(contentType)) return null;
Pattern p = Pattern.compile(".*charset=([^\\s^;^,]+)");
Matcher m = p.matcher(contentType);
if (m.find()) {
try {
return m.group(1);
} catch (Exception e) {
return null;
}
}
return null;
}
Then use the extracted charset to create the InputStreamReader
:
String charsetName = extractCharsetFromContentType(connection.getContentType());
InputStreamReader inReader = (TextUtils.isEmpty(charsetName) ? new InputStreamReader(inputStream) :
new InputStreamReader(inputStream, charsetName));
BufferedReader reader = new BufferedReader(inReader);
It is just that your convertStreamToString is not honoring encoding set in the HttpRespnose. If you look inside EntityUtils.toString(entity, HTTP.UTF_8)
, you will see that EntityUtils find out if there is encoding set in the HttpResponse first, then if there is, EntityUtils use that encoding. It will only fall back to the encoding passed in the parameter(in this case HTTP.UTF_8) if there isn't encoding set in the entity.
So you can say that your HTTP.UTF_8 is passed in the parameter but it never get used because it is the wrong encoding. So here is update to your code with the helper method from EntityUtils.
HttpEntity entity = response.getEntity();
String charset = getContentCharSet(entity);
InputStream instream = entity.getContent();
String jsonText = convertStreamToString(instream,charset);
private static String getContentCharSet(final HttpEntity entity) throws ParseException {
if (entity == null) {
throw new IllegalArgumentException("HTTP entity may not be null");
}
String charset = null;
if (entity.getContentType() != null) {
HeaderElement values[] = entity.getContentType().getElements();
if (values.length > 0) {
NameValuePair param = values[0].getParameterByName("charset");
if (param != null) {
charset = param.getValue();
}
}
}
return TextUtils.isEmpty(charset) ? HTTP.UTF_8 : charset;
}
private static String convertStreamToString(InputStream is, String encoding) {
/*
* To convert the InputStream to String we use the
* BufferedReader.readLine() method. We iterate until the BufferedReader
* return null which means there's no more data to read. Each line will
* appended to a StringBuilder and returned as String.
*/
BufferedReader reader;
try {
reader = new BufferedReader(new InputStreamReader(is, encoding));
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
StringBuilder sb = new StringBuilder();
String line;
try {
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return sb.toString();
}
@Arhimed's answer is the solution. But I cannot see anything obviously wrong with your convertStreamToString
code.
My guesses are:
convertStreamToString
is reading the character stream a line at a time, and reassembling it using a hard-wired '\n'
as the end-of-line marker. If you are going to write that to an external file or application, you should probably should be using a platform specific end-of-line marker.Archimed's answer is correct. However, that can be done simply by providing an additional header in the HTTP request:
Accept-charset: utf-8
No need to remove anything or use any other library.
For example,
GET / HTTP/1.1
Host: www.website.com
Connection: close
Accept: text/html
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.10 Safari/537.36
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: utf-8
Most probably your request doesn't have any Accept-Charset
header.
Try this:
if (entity != null) {
// A Simple JSON Response Read
// InputStream instream = entity.getContent();
// String jsonText = convertStreamToString(instream);
String jsonText = EntityUtils.toString(entity, HTTP.UTF_8);
// ... toast code here
}