How to Cache InputStream for Multiple Use

后端 未结 10 951
庸人自扰
庸人自扰 2020-11-29 05:57

I have an InputStream of a file and i use apache poi components to read from it like this:

POIFSFileSystem fileSystem = new POIFSFileSystem(inputStream);


        
相关标签:
10条回答
  • 2020-11-29 06:22

    This is how I would implemented, to be safely used with any InputStream :

    • write your own InputStream wrapper where you create a temporary file to mirror the original stream content
    • dump everything read from the original input stream into this temporary file
    • when the stream was completely read you will have all the data mirrored in the temporary file
    • use InputStream.reset to switch(initialize) the internal stream to a FileInputStream(mirrored_content_file)
    • from now on you will loose the reference of the original stream(can be collected)
    • add a new method release() which will remove the temporary file and release any open stream.
    • you can even call release() from finalize to be sure the temporary file is release in case you forget to call release()(most of the time you should avoid using finalize, always call a method to release object resources). see Why would you ever implement finalize()?
    0 讨论(0)
  • 2020-11-29 06:27

    This answer iterates on previous ones 1|2 based on the BufferInputStream. The main changes are that it allows infinite reuse. And takes care of closing the original source input stream to free-up system resources. Your OS defines a limit on those and you don't want the program to run out of file handles (That's also why you should always 'consume' responses e.g. with the apache EntityUtils.consumeQuietly()). EDIT Updated the code to handle for gready consumers that use read(buffer, offset, length), in that case it may happen that BufferedInputStream tries hard to look at the source, this code protects against that use.

    public class CachingInputStream extends BufferedInputStream {    
        public CachingInputStream(InputStream source) {
            super(new PostCloseProtection(source));
            super.mark(Integer.MAX_VALUE);
        }
    
        @Override
        public synchronized void close() throws IOException {
            if (!((PostCloseProtection) in).decoratedClosed) {
                in.close();
            }
            super.reset();
        }
    
        private static class PostCloseProtection extends InputStream {
            private volatile boolean decoratedClosed = false;
            private final InputStream source;
    
            public PostCloseProtection(InputStream source) {
                this.source = source;
            }
    
            @Override
            public int read() throws IOException {
                return decoratedClosed ? -1 : source.read();
            }
    
            @Override
            public int read(byte[] b) throws IOException {
                return decoratedClosed ? -1 : source.read(b);
            }
    
            @Override
            public int read(byte[] b, int off, int len) throws IOException {
                return decoratedClosed ? -1 : source.read(b, off, len);
            }
    
            @Override
            public long skip(long n) throws IOException {
                return decoratedClosed ? 0 : source.skip(n);
            }
    
            @Override
            public int available() throws IOException {
                return source.available();
            }
    
            @Override
            public void close() throws IOException {
                decoratedClosed = true;
                source.close();
            }
    
            @Override
            public void mark(int readLimit) {
                source.mark(readLimit);
            }
    
            @Override
            public void reset() throws IOException {
                source.reset();
            }
    
            @Override
            public boolean markSupported() {
                return source.markSupported();
            }
        }
    }
    

    To reuse it just close it first if it wasn't.

    One limitation though is that if the stream is closed before the whole content of the original stream has been read, then this decorator will have incomplete data, so make sure the whole stream is read before closing.

    0 讨论(0)
  • 2020-11-29 06:28

    you can decorate InputStream being passed to POIFSFileSystem with a version that when close() is called it respond with reset():

    class ResetOnCloseInputStream extends InputStream {
    
        private final InputStream decorated;
    
        public ResetOnCloseInputStream(InputStream anInputStream) {
            if (!anInputStream.markSupported()) {
                throw new IllegalArgumentException("marking not supported");
            }
    
            anInputStream.mark( 1 << 24); // magic constant: BEWARE
            decorated = anInputStream;
        }
    
        @Override
        public void close() throws IOException {
            decorated.reset();
        }
    
        @Override
        public int read() throws IOException {
            return decorated.read();
        }
    }
    

    testcase

    static void closeAfterInputStreamIsConsumed(InputStream is)
            throws IOException {
        int r;
    
        while ((r = is.read()) != -1) {
            System.out.println(r);
        }
    
        is.close();
        System.out.println("=========");
    
    }
    
    public static void main(String[] args) throws IOException {
        InputStream is = new ByteArrayInputStream("sample".getBytes());
        ResetOnCloseInputStream decoratedIs = new ResetOnCloseInputStream(is);
        closeAfterInputStreamIsConsumed(decoratedIs);
        closeAfterInputStreamIsConsumed(decoratedIs);
        closeAfterInputStreamIsConsumed(is);
    }
    

    EDIT 2

    you can read the entire file in a byte[] (slurp mode) then passing it to a ByteArrayInputStream

    0 讨论(0)
  • 2020-11-29 06:34

    This works correctly:

    byte[] bytes = getBytes(inputStream);
    POIFSFileSystem fileSystem = new POIFSFileSystem(new ByteArrayInputStream(bytes));
    

    where getBytes is like this:

    private static byte[] getBytes(InputStream is) throws IOException {
        byte[] buffer = new byte[8192];
    ByteArrayOutputStream baos = new ByteArrayOutputStream(2048);
    int n;
    baos.reset();
    
    while ((n = is.read(buffer, 0, buffer.length)) != -1) {
          baos.write(buffer, 0, n);
        }
    
       return baos.toByteArray();
     }
    
    0 讨论(0)
  • I just add my solution here, as this works for me. It basically is a combination of the top two answers :)

        private String convertStreamToString(InputStream is) {
        Writer w = new StringWriter();
        char[] buf = new char[1024];
        Reader r;
        is.mark(1 << 24);
        try {
            r = new BufferedReader(new InputStreamReader(is, "UTF-8"));
            int n;
            while ((n=r.read(buf)) != -1) {
                w.write(buf, 0, n);
            }
            is.reset();
        } catch(UnsupportedEncodingException e) {
            Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
        } catch(IOException e) {
            Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
        }
        return w.toString();
    }
    
    0 讨论(0)
  • 2020-11-29 06:36

    If the file is not that big, read it into a byte[] array and give POI a ByteArrayInputStream created from that array.

    If the file is big, then you shouldn't care, since the OS will do the caching for you as best as it can.

    [EDIT] Use Apache commons-io to read the File into a byte array in an efficient way. Do not use int read() since it reads the file byte by byte which is very slow!

    If you want to do it yourself, use a File object to get the length, create the array and the a loop which reads bytes from the file. You must loop since read(byte[], int offset, int len) can read less than len bytes (and usually does).

    0 讨论(0)
提交回复
热议问题