Android Java UTF-8 HttpClient Problem

后端 未结 5 2176
自闭症患者
自闭症患者 2020-12-01 04:35

I am having weird character encoding issues with a JSON array that is grabbed from a web page. The server is sending back this header:

Content-Type text/javascri

相关标签:
5条回答
  • 2020-12-01 05:18

    Extract the charset from the response content type field. You can use the following method to do this:

    private static String extractCharsetFromContentType(String contentType) {
        if (TextUtils.isEmpty(contentType)) return null;
    
        Pattern p = Pattern.compile(".*charset=([^\\s^;^,]+)");
        Matcher m = p.matcher(contentType);
    
        if (m.find()) {
            try {
                return m.group(1);
            } catch (Exception e) {
                return null;
            }
        }
    
        return null;
    }
    

    Then use the extracted charset to create the InputStreamReader:

    String charsetName = extractCharsetFromContentType(connection.getContentType());
    
    InputStreamReader inReader = (TextUtils.isEmpty(charsetName) ? new InputStreamReader(inputStream) :
                        new InputStreamReader(inputStream, charsetName));
                BufferedReader reader = new BufferedReader(inReader);
    
    0 讨论(0)
  • 2020-12-01 05:21

    It is just that your convertStreamToString is not honoring encoding set in the HttpRespnose. If you look inside EntityUtils.toString(entity, HTTP.UTF_8), you will see that EntityUtils find out if there is encoding set in the HttpResponse first, then if there is, EntityUtils use that encoding. It will only fall back to the encoding passed in the parameter(in this case HTTP.UTF_8) if there isn't encoding set in the entity.

    So you can say that your HTTP.UTF_8 is passed in the parameter but it never get used because it is the wrong encoding. So here is update to your code with the helper method from EntityUtils.

               HttpEntity entity = response.getEntity();
               String charset = getContentCharSet(entity);
               InputStream instream = entity.getContent();
               String jsonText = convertStreamToString(instream,charset);
    
        private static String getContentCharSet(final HttpEntity entity) throws ParseException {
        if (entity == null) {
            throw new IllegalArgumentException("HTTP entity may not be null");
        }
        String charset = null;
        if (entity.getContentType() != null) {
            HeaderElement values[] = entity.getContentType().getElements();
            if (values.length > 0) {
                NameValuePair param = values[0].getParameterByName("charset");
                if (param != null) {
                    charset = param.getValue();
                }
            }
        }
        return TextUtils.isEmpty(charset) ? HTTP.UTF_8 : charset;
    }
    
    
    
    private static String convertStreamToString(InputStream is, String encoding) {
        /*
         * To convert the InputStream to String we use the
         * BufferedReader.readLine() method. We iterate until the BufferedReader
         * return null which means there's no more data to read. Each line will
         * appended to a StringBuilder and returned as String.
         */
        BufferedReader reader;
        try {
            reader = new BufferedReader(new InputStreamReader(is, encoding));
        } catch (UnsupportedEncodingException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
        StringBuilder sb = new StringBuilder();
    
        String line;
        try {
            while ((line = reader.readLine()) != null) {
                sb.append(line + "\n");
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                is.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return sb.toString();
    }
    
    0 讨论(0)
  • 2020-12-01 05:22

    @Arhimed's answer is the solution. But I cannot see anything obviously wrong with your convertStreamToString code.

    My guesses are:

    1. The server is putting a UTF Byte Order Mark (BOM) at the start of the stream. The standard Java UTF-8 character decoder does not remove the BOM, so the chances are that it would end up in the resulting String. (However, the code for EntityUtils doesn't seem to do anything with BOMs either.)
    2. Your convertStreamToString is reading the character stream a line at a time, and reassembling it using a hard-wired '\n' as the end-of-line marker. If you are going to write that to an external file or application, you should probably should be using a platform specific end-of-line marker.
    0 讨论(0)
  • 2020-12-01 05:36

    Archimed's answer is correct. However, that can be done simply by providing an additional header in the HTTP request:

    Accept-charset: utf-8
    

    No need to remove anything or use any other library.

    For example,

    GET / HTTP/1.1
    Host: www.website.com
    Connection: close
    Accept: text/html
    Upgrade-Insecure-Requests: 1
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.10 Safari/537.36
    DNT: 1
    Accept-Encoding: gzip, deflate, sdch
    Accept-Language: en-US,en;q=0.8
    Accept-Charset: utf-8
    

    Most probably your request doesn't have any Accept-Charset header.

    0 讨论(0)
  • 2020-12-01 05:37

    Try this:

    if (entity != null) {
        // A Simple JSON Response Read
        // InputStream instream = entity.getContent();
        // String jsonText = convertStreamToString(instream);
    
        String jsonText = EntityUtils.toString(entity, HTTP.UTF_8);
    
        // ... toast code here
    }
    
    0 讨论(0)
提交回复
热议问题