Java : How to determine the correct charset encoding of a stream

前端 未结 15 1655
花落未央
花落未央 2020-11-22 02:06

With reference to the following thread: Java App : Unable to read iso-8859-1 encoded file correctly

What is the best way to programatically determine the correct cha

15条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-11-22 02:35

    check this out: http://site.icu-project.org/ (icu4j) they have libraries for detecting charset from IOStream could be simple like this:

    BufferedInputStream bis = new BufferedInputStream(input);
    CharsetDetector cd = new CharsetDetector();
    cd.setText(bis);
    CharsetMatch cm = cd.detect();
    
    if (cm != null) {
       reader = cm.getReader();
       charset = cm.getName();
    }else {
       throw new UnsupportedCharsetException()
    }
    

提交回复
热议问题