InputStreamReader buffering issue

后端 未结 6 883
感情败类
感情败类 2021-02-05 21:18

I am reading data from a file that has, unfortunately, two types of character encoding.

There is a header and a body. The header is always in ASCII and defines the char

相关标签:
6条回答
  • 2021-02-05 21:39

    I suggest rereading the stream from the start with a new InputStreamReader. Perhaps assume that InputStream.mark is supported.

    0 讨论(0)
  • 2021-02-05 21:46

    My first thought is to close the stream and reopen it, using InputStream#skip to skip past the header before giving the stream to the new InputStreamReader.

    If you really, really don't want to reopen the file, you could use file descriptors to get more than one stream to the file, although you may have to use channels to have multiple positions within the file (since you can't assume you can reset the position with reset, it may not be supported).

    0 讨论(0)
  • 2021-02-05 21:48

    It's even easier:

    As you said, your header is always in ASCII. So read the header directly from the InputStream, and when you're done with it, create the Reader with the correct encoding and read from it

    private Reader reader;
    private InputStream stream;
    
    public void read() {
        int c = 0;
        while ((c = stream.read()) != -1) {
            // Read encoding
            if ( headerFullyRead ) {
                reader = new InputStreamReader( stream, encoding );
                break;
            }
        }
        while ((c = reader.read()) != -1) {
            // Handle rest of file
        }
    }
    
    0 讨论(0)
  • 2021-02-05 21:50

    Here is the pseudo code.

    1. Use InputStream, but do not wrap a Reader around it.
    2. Read bytes containing header and store them into ByteArrayOutputStream.
    3. Create ByteArrayInputStream from ByteArrayOutputStream and decode header, this time wrap ByteArrayInputStream into Reader with ASCII charset.
    4. Compute the length of non-ascii input, and read that number of bytes into another ByteArrayOutputStream.
    5. Create another ByteArrayInputStream from the second ByteArrayOutputStream and wrap it with Reader with charset from the header.
    0 讨论(0)
  • 2021-02-05 22:00

    If you wrap the InputStream and limit all reads to just 1 byte at a time, it seems to disable the buffering inside of InputStreamReader.

    This way we don't have to rewrite the InputStreamReader logic.

    public class OneByteReadInputStream extends InputStream
    {
        private final InputStream inputStream;
    
        public OneByteReadInputStream(InputStream inputStream)
        {
            this.inputStream = inputStream;
        }
    
        @Override
        public int read() throws IOException
        {
            return inputStream.read();
        }
    
        @Override
        public int read(byte[] b, int off, int len) throws IOException
        {
            return super.read(b, off, 1);
        }
    }
    

    To construct:

    new InputStreamReader(new OneByteReadInputStream(inputStream));
    
    0 讨论(0)
  • 2021-02-05 22:04

    Why don't you use 2 InputStreams? One for reading the header and another for the body.

    The second InputStream should skip the header bytes.

    0 讨论(0)
提交回复
热议问题