How to detect illegal UTF-8 byte sequences to replace them in java inputstream?

后端 未结 3 414
独厮守ぢ
独厮守ぢ 2021-02-02 15:22

The file in question is not under my control. Most byte sequences are valid UTF-8, it is not ISO-8859-1 (or an other encoding). I want to do my best do extract as much informat

3条回答
  •  走了就别回头了
    2021-02-02 15:43

    The behaviour you want is already the default for InputStreamReader. So there is no need to specify it yourself. This suffices:

    final BufferedInputStream in = new BufferedInputStream(istream);
    final Reader inputReader = new InputStreamReader(in, StandardCharsets.UTF_8);
    

提交回复
热议问题