Java: Memory efficient ByteArrayOutputStream

前端 未结 9 551
一个人的身影
一个人的身影 2021-02-04 06:15

I\'ve got a 40MB file in the disk and I need to \"map\" it into memory using a byte array.

At first, I thought writing the file to a ByteArrayOutputStream would be the b

相关标签:
9条回答
  • 2021-02-04 06:23

    I'm thinking I could just extend ByteArrayOutputStream and rewrite this method, so to return the original array directly. Is there any potential danger here, given the stream and the byte array won't be used more than once?

    You shouldn't change the specified behavior of the existing method, but it's perfectly fine to add a new method. Here's an implementation:

    /** Subclasses ByteArrayOutputStream to give access to the internal raw buffer. */
    public class ByteArrayOutputStream2 extends java.io.ByteArrayOutputStream {
        public ByteArrayOutputStream2() { super(); }
        public ByteArrayOutputStream2(int size) { super(size); }
    
        /** Returns the internal buffer of this ByteArrayOutputStream, without copying. */
        public synchronized byte[] buf() {
            return this.buf;
        }
    }
    

    An alternative but hackish way of getting the buffer from any ByteArrayOutputStream is to use the fact that its writeTo(OutputStream) method passes the buffer directly to the provided OutputStream:

    /**
     * Returns the internal raw buffer of a ByteArrayOutputStream, without copying.
     */
    public static byte[] getBuffer(ByteArrayOutputStream bout) {
        final byte[][] result = new byte[1][];
        try {
            bout.writeTo(new OutputStream() {
                @Override
                public void write(byte[] buf, int offset, int length) {
                    result[0] = buf;
                }
    
                @Override
                public void write(int b) {}
            });
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        return result[0];
    }
    

    (That works, but I'm not sure if it's useful, given that subclassing ByteArrayOutputStream is simpler.)

    However, from the rest of your question it sounds like all you want is a plain byte[] of the complete contents of the file. As of Java 7, the simplest and fastest way to do that is call Files.readAllBytes. In Java 6 and below, you can use DataInputStream.readFully, as in Peter Lawrey's answer. Either way, you will get an array that is allocated once at the correct size, without the repeated reallocation of ByteArrayOutputStream.

    0 讨论(0)
  • 2021-02-04 06:25

    ByteArrayOutputStream should be okay so long as you specify an appropriate size in the constructor. It will still create a copy when you call toByteArray, but that's only temporary. Do you really mind the memory briefly going up a lot?

    Alternatively, if you already know the size to start with you can just create a byte array and repeatedly read from a FileInputStream into that buffer until you've got all the data.

    0 讨论(0)
  • 2021-02-04 06:26

    If you really want to map the file into memory, then a FileChannel is the appropriate mechanism.

    If all you want to do is read the file into a simple byte[] (and don't need changes to that array to be reflected back to the file), then simply reading into an appropriately-sized byte[] from a normal FileInputStream should suffice.

    Guava has Files.toByteArray() which does all that for you.

    0 讨论(0)
  • 2021-02-04 06:27

    For an explanation of the buffer growth behavior of ByteArrayOutputStream, please read this answer.

    In answer to your question, it is safe to extend ByteArrayOutputStream. In your situation, it is probably better to override the write methods such that the maximum additional allocation is limited, say, to 16MB. You should not override the toByteArray to expose the protected buf[] member. This is because a stream is not a buffer; A stream is a buffer that has a position pointer and boundary protection. So, it is dangerous to access and potentially manipulate the buffer from outside the class.

    0 讨论(0)
  • 2021-02-04 06:33

    MappedByteBuffer might be what you're looking for.

    I'm surprised it takes so much RAM to read a file in memory, though. Have you constructed the ByteArrayOutputStream with an appropriate capacity? If you haven't, the stream could allocate a new byte array when it's near the end of the 40 MB, meaning that you would, for example, have a full buffer of 39MB, and a new buffer of twice the size. Whereas if the stream has the appropriate capacity, there won't be any reallocation (faster), and no wasted memory.

    0 讨论(0)
  • 2021-02-04 06:33

    Google Guava ByteSource seems to be a good choice for buffering in memory. Unlike implementations like ByteArrayOutputStream or ByteArrayList(from Colt Library) it does not merge the data into a huge byte array but stores every chunk separately. An example:

    List<ByteSource> result = new ArrayList<>();
    try (InputStream source = httpRequest.getInputStream()) {
        byte[] cbuf = new byte[CHUNK_SIZE];
        while (true) {
            int read = source.read(cbuf);
            if (read == -1) {
                break;
            } else {
                result.add(ByteSource.wrap(Arrays.copyOf(cbuf, read)));
            }
        }
    }
    ByteSource body = ByteSource.concat(result);
    

    The ByteSource can be read as an InputStream anytime later:

    InputStream data = body.openBufferedStream();
    
    0 讨论(0)
提交回复
热议问题