Filter (search and replace) array of bytes in an InputStream

后端 未结 6 1655
清歌不尽
清歌不尽 2020-11-30 05:49

I have an InputStream which takes the html file as input parameter. I have to get the bytes from the input stream .

I have a string: \"XYZ\". I\'d like

6条回答
  •  有刺的猬
    2020-11-30 06:41

    I needed a solution to this, but found the answers here incurred too much memory and/or CPU overhead. The below solution significantly outperforms the others here in these terms based on simple benchmarking.

    This solution is especially memory-efficient, incurring no measurable cost even with >GB streams.

    That said, this is not a zero-CPU-cost solution. The CPU/processing-time overhead is probably reasonable for all but the most demanding/resource-sensitive scenarios, but the overhead is real and should be considered when evaluating the worthiness of employing this solution in a given context.

    In my case, our max real-world file size that we are processing is about 6MB, where we see added latency of about 170ms with 44 URL replacements. This is for a Zuul-based reverse-proxy running on AWS ECS with a single CPU share (1024). For most of the files (under 100KB), the added latency is sub-millisecond. Under high-concurrency (and thus CPU contention), the added latency could increase, however we are currently able to process hundreds of the files concurrently on a single node with no humanly-noticeable latency impact.

    The solution we are using:

    import java.io.IOException;
    import java.io.InputStream;
    
    public class TokenReplacingStream extends InputStream {
    
        private final InputStream source;
        private final byte[] oldBytes;
        private final byte[] newBytes;
        private int tokenMatchIndex = 0;
        private int bytesIndex = 0;
        private boolean unwinding;
        private int mismatch;
        private int numberOfTokensReplaced = 0;
    
        public TokenReplacingStream(InputStream source, byte[] oldBytes, byte[] newBytes) {
            assert oldBytes.length > 0;
            this.source = source;
            this.oldBytes = oldBytes;
            this.newBytes = newBytes;
        }
    
        @Override
        public int read() throws IOException {
    
            if (unwinding) {
                if (bytesIndex < tokenMatchIndex) {
                    return oldBytes[bytesIndex++];
                } else {
                    bytesIndex = 0;
                    tokenMatchIndex = 0;
                    unwinding = false;
                    return mismatch;
                }
            } else if (tokenMatchIndex == oldBytes.length) {
                if (bytesIndex == newBytes.length) {
                    bytesIndex = 0;
                    tokenMatchIndex = 0;
                    numberOfTokensReplaced++;
                } else {
                    return newBytes[bytesIndex++];
                }
            }
    
            int b = source.read();
            if (b == oldBytes[tokenMatchIndex]) {
                tokenMatchIndex++;
            } else if (tokenMatchIndex > 0) {
                mismatch = b;
                unwinding = true;
            } else {
                return b;
            }
    
            return read();
    
        }
    
        @Override
        public void close() throws IOException {
            source.close();
        }
    
        public int getNumberOfTokensReplaced() {
            return numberOfTokensReplaced;
        }
    
    }
    

提交回复
热议问题