Java NIO scan through ByteBuffer for certain bytes and word with sections

こ雲淡風輕ζ 提交于 2019-12-02 05:21:21

Leaving the I/O aside, once you have content in the ByteBuffer it would be a lot simpler to convert it to a CharBuffer via asCharBuffer(). Then CharBuffer implements CharSequence, which gives you a lot of String and regex methods to use.

Here is the solution I ended up with, using the bulk relative get function of ByteBuffer to get the chunk each time. I think I'm using the mark() functionality as it's intended, though am using an additional variable (pos) to keep track of the mark since I can't find a function in ByteBuffer to return the relative position of the mark itself. Also, I've got explicit functionality to look for either \r, \n, or both in sequence. Keep in mind this code will only work on UTF-8 encoded data. I hope this helps someone else.

public class Test {
    public static final Charset ENCODING = Charset.forName("UTF-8");
    public static final byte[] NEWLINE_BYTES = {0x0A, 0x0D};

    public Test() {
        //test text file sequence of any strings followed by newline
        String pathString = "test.txt";
        Path path = Paths.get(pathString);

        try (FileChannel fc = FileChannel.open(path, 
                StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {

            if (fc.size() > 0) {
                int n;
                ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
                do {                    
                    n = fc.read(buffer);
                } while (n != -1 && buffer.hasRemaining());
                buffer.flip();
                int newlineByteCount = 0;
                buffer.mark();
                do {
                    //get one byte at a time
                    byte b = buffer.get();

                    if (b == NEWLINE_BYTES[0] || b == NEWLINE_BYTES[1]) {
                        newlineByteCount++;

                        byte nextByte = buffer.get();
                        if (nextByte == NEWLINE_BYTES[1]) {
                            newlineByteCount++;
                        } else {
                            buffer.position(buffer.position() - 1);
                        }

                        int pos = buffer.position();
                        //reset the buffer back to the mark() position
                        buffer.reset();
                        //create an array just the right length and get the bytes we just measured out 
                        int length = pos - buffer.position() - newlineByteCount;
                        byte[] lineBytes = new byte[length];
                        buffer.get(lineBytes, 0, length);

                        String lineString = new String(lineBytes, ENCODING);
                        System.out.println("LINE: " + lineString);

                        buffer.position(buffer.position() + newlineByteCount);

                        buffer.mark();
                        newlineByteCount = 0;
                    } else if (newlineByteCount > 0) {

                    }
                } while (buffer.hasRemaining());
            } 
        } catch (IOException ioe) { ioe.printStackTrace(); }
    }
    public static void main(String args[]) { new Test(); }
}

I needed something similar but more general than splitting a single buffer. In my case, I've multiple buffers; in fact, my code is a modification of Spring StringDecoder that can convert a Flux<DataBuffer>(DataBuffer) to Flux<String>.

https://stackoverflow.com/a/48111196/839733

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!