Java NIO MappedByteBuffer OutOfMemoryException

血红的双手。 提交于 2019-11-28 22:05:51

The bigger the file, the less you want it all in memory at once. Devise a way to process the file a buffer at a time, a line at a time, etc.

MappedByteBuffers are especially problematic, as there is no defined release of the mapped memory, so using more than one at a time is essentially bound to fail.

I can offer some working code. Whether this solves your problem or not is difficult to say. This hunts through a file for a pattern recognised by the Hunter.

See the excellent article Java tip: How to read files quickly for the original research (not mine).

// 4k buffer size.
static final int SIZE = 4 * 1024;
static byte[] buffer = new byte[SIZE];

// Fastest because a FileInputStream has an associated channel.
private static void ScanDataFile(Hunter p, FileInputStream f) throws FileNotFoundException, IOException {
  // Use a mapped and buffered stream for best speed.
  // See: http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly
  FileChannel ch = f.getChannel();
  long red = 0L;
  do {
    long read = Math.min(Integer.MAX_VALUE, ch.size() - red);
    MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, red, read);
    int nGet;
    while (mb.hasRemaining() && p.ok()) {
      nGet = Math.min(mb.remaining(), SIZE);
      mb.get(buffer, 0, nGet);
      for (int i = 0; i < nGet && p.ok(); i++) {
        p.check(buffer[i]);
      }
    }
    red += read;
  } while (red < ch.size() && p.ok());
  // Finish off.
  p.close();
  ch.close();
  f.close();
}

What I use is a List<ByteBuffer> where each ByteBuffer maps to the file in block of 16 MB to 1 GB. I uses powers of 2 to simplify the logic. I have used this to map in files up to 8 TB.

A key limitation of memory mapped files is that you are limited by your virtual memory. If you have a 32-bit JVM you won't be able to map in very much.

I wouldn't keep creating new memory mappings for a file because these are never cleaned up. You can create lots of these but there appears to be a limit of about 32K of them on some systems (no matter how small they are)

The main reason I find MemoryMappedFiles useful is that they don't need to be flushed (if you can assume the OS won't die) This allows you to write data in a low latency way, without worrying about losing too much data if the application dies or too much performance by having to write() or flush().

You don't use the FileChannel API to write the entire file at once. Instead, you send the file in parts. See example code in Martin Thompson's post comparing performance of Java IO techniques: Java Sequential IO Performance

In addition, there is not much documentation because you are making a platform-dependent call. from the map() JavaDoc:

Many of the details of memory-mapped files are inherently dependent upon the underlying operating system and are therefore unspecified.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!