Java: Memory efficient ByteArrayOutputStream

前端 未结 9 552
一个人的身影
一个人的身影 2021-02-04 06:15

I\'ve got a 40MB file in the disk and I need to \"map\" it into memory using a byte array.

At first, I thought writing the file to a ByteArrayOutputStream would be the b

相关标签:
9条回答
  • 2021-02-04 06:43

    ... but I find it takes about 160MB of heap space at some moment during the copy operation

    I find this extremely surprising ... to the extent that I have my doubts that you are measuring the heap usage correctly.

    Let's assume that your code is something like this:

    BufferedInputStream bis = new BufferedInputStream(
            new FileInputStream("somefile"));
    ByteArrayOutputStream baos = new ByteArrayOutputStream();  /* no hint !! */
    
    int b;
    while ((b = bis.read()) != -1) {
        baos.write((byte) b);
    }
    byte[] stuff = baos.toByteArray();
    

    Now the way that a ByteArrayOutputStream manages its buffer is to allocate an initial size, and (at least) double the buffer when it fills it up. Thus, in the worst case baos might use up to 80Mb buffer to hold a 40Mb file.

    The final step allocates a new array of exactly baos.size() bytes to hold the buffer's contents. That's 40Mb. So the peak amount of memory that is actually in use should be 120Mb.

    So where are those extra 40Mb being used? My guess is that they are not, and that you are actually reporting the total heap size, not the amount of memory that is occupied by reachable objects.


    So what is the solution?

    1. You could use a memory mapped buffer.

    2. You could give a size hint when you allocate the ByteArrayOutputStream; e.g.

       ByteArrayOutputStream baos = ByteArrayOutputStream(file.size());
      
    3. You could dispense with the ByteArrayOutputStream entirely and read directly into a byte array.

       byte[] buffer = new byte[file.size()];
       FileInputStream fis = new FileInputStream(file);
       int nosRead = fis.read(buffer);
       /* check that nosRead == buffer.length and repeat if necessary */
      

    Both options 1 and 2 should have an peak memory usage of 40Mb while reading a 40Mb file; i.e. no wasted space.


    It would be helpful if you posted your code, and described your methodology for measuring memory usage.


    I'm thinking I could just extend ByteArrayOutputStream and rewrite this method, so to return the original array directly. Is there any potential danger here, given the stream and the byte array won't be used more than once?

    The potential danger is that your assumptions are incorrect, or become incorrect due to someone else modifying your code unwittingly ...

    0 讨论(0)
  • 2021-02-04 06:46

    ... came here with the same observation when reading a 1GB file: Oracle's ByteArrayOutputStream has a lazy memory management. A byte-Array is indexed by an int and such anyway limited to 2GB. Without dependency on 3rd-party you might find this useful:

    static public byte[] getBinFileContent(String aFile) 
    {
        try
        {
            final int bufLen = 32768;
            final long fs = new File(aFile).length();
            final long maxInt = ((long) 1 << 31) - 1;
            if (fs > maxInt)
            {
                System.err.println("file size out of range");
                return null;
            }
            final byte[] res = new byte[(int) fs];
            final byte[] buffer = new byte[bufLen];
            final InputStream is = new FileInputStream(aFile);
            int n;
            int pos = 0;
            while ((n = is.read(buffer)) > 0)
            {
                System.arraycopy(buffer, 0, res, pos, n);
                pos += n;
            }
            is.close();
            return res;
        }
        catch (final IOException e)
        {
            e.printStackTrace();
            return null;
        }
        catch (final OutOfMemoryError e)
        {
            e.printStackTrace();
            return null;
        }
    }
    
    0 讨论(0)
  • 2021-02-04 06:47

    If you have 40 MB of data I don't see any reason why it would take more than 40 MB to create a byte[]. I assume you are using a growing ByteArrayOutputStream which creates a byte[] copy when finished.

    You can try the old read the file at once approach.

    File file = 
    DataInputStream is = new DataInputStream(FileInputStream(file));
    byte[] bytes = new byte[(int) file.length()];
    is.readFully(bytes);
    is.close();
    

    Using a MappedByteBuffer is more efficient and avoids a copy of data (or using the heap much) provided you can use the ByteBuffer directly, however if you have to use a byte[] its unlikely to help much.

    0 讨论(0)
提交回复
热议问题