How to detect illegal UTF-8 byte sequences to replace them in java inputstream?

后端 未结 3 406
独厮守ぢ
独厮守ぢ 2021-02-02 15:22

The file in question is not under my control. Most byte sequences are valid UTF-8, it is not ISO-8859-1 (or an other encoding). I want to do my best do extract as much informat

3条回答
  •  梦谈多话
    2021-02-02 15:36

    java.nio.charset.CharsetDecoder does what you need. This class provides charset decoding with user-definable actions on different kinds of errors (see onMalformedInput() and onUnmappableCharacter()).

    CharsetDecoder writes to an OutputStream, which you can pipe into an InputStream using java.io.PipedOutputStream, effectively creating a filtered InputStream.

提交回复
热议问题