I have a big file, it\'s expected to be around 12 GB. I want to load it all into memory on a beefy 64-bit machine with 16 GB RAM, but I think Java does not support byte arra
Java array indices are of type int
(4 bytes or 32 bits), so I'm afraid you're limited to 231 − 1 or 2147483647 slots in your array. I'd read the data into another data structure, like a 2D array.
java doesn't support direct array with more than 2^32 elements presently,
hope to see this feature of java in future
If necessary, you can load the data into an array of arrays, which will give you a maximum of int.maxValue squared bytes, more than even the beefiest machine would hold well in memory.
Java arrays use integers for their indices. As a result, the maximum array size is Integer.MAX_VALUE.
(Unfortunately, I can't find any proof from Sun themselves about this, but there are plenty of discussions on their forums about it already.)
I think the best solution you could do in the meantime would be to make a 2D array, i.e.:
byte[][] data;
As others have said, all Java arrays of all types are indexed by int
, and so can be of max size 231 − 1, or 2147483647 elements (~2 billion). This is specified by the Java Language Specification so switching to another operating system or Java Virtual Machine won't help.
If you wanted to write a class to overcome this as suggested above you could, which could use an array of arrays (for a lot of flexibility) or change types (a long
is 8 bytes so a long[]
can be 8 times bigger than a byte[]
).
You might consider using FileChannel and MappedByteBuffer to memory map the file,
FileChannel fCh = new RandomAccessFile(file,"rw").getChannel();
long size = fCh.size();
ByteBuffer map = fCh.map(FileChannel.MapMode.READ_WRITE, 0, fileSize);
Edit:
Ok, I'm an idiot it looks like ByteBuffer only takes a 32-bit index as well which is odd since the size parameter to FileChannel.map is a long... But if you decide to break up the file into multiple 2Gb chunks for loading I'd still recommend memory mapped IO as there can be pretty large performance benefits. You're basically moving all IO responsibility to the OS kernel.