I have a big file, it\'s expected to be around 12 GB. I want to load it all into memory on a beefy 64-bit machine with 16 GB RAM, but I think Java does not support byte arra
don't limit your self with Integer.MAX_VALUE
although this question has been asked many years ago, but a i wanted to participate with a simple example using only java se without any external libraries
at first let's say it's theoretically impossible but practically possible
a new look : if the array is an object of elements what about having an object that is array of arrays
here's the example
import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.List;
/**
*
* @author Anosa
*/
public class BigArray<t>{
private final static int ARRAY_LENGTH = 1000000;
public final long length;
private List<t[]> arrays;
public BigArray(long length, Class<t> glasss)
{
this.length = length;
arrays = new ArrayList<>();
setupInnerArrays(glasss);
}
private void setupInnerArrays(Class<t> glasss)
{
long numberOfArrays = length / ARRAY_LENGTH;
long remender = length % ARRAY_LENGTH;
/*
we can use java 8 lambdas and streams:
LongStream.range(0, numberOfArrays).
forEach(i ->
{
arrays.add((t[]) Array.newInstance(glasss, ARRAY_LENGTH));
});
*/
for (int i = 0; i < numberOfArrays; i++)
{
arrays.add((t[]) Array.newInstance(glasss, ARRAY_LENGTH));
}
if (remender > 0)
{
//the remainer will 100% be less than the [ARRAY_LENGTH which is int ] so
//no worries of casting (:
arrays.add((t[]) Array.newInstance(glasss, (int) remender));
}
}
public void put(t value, long index)
{
if (index >= length || index < 0)
{
throw new IndexOutOfBoundsException("out of the reange of the array, your index must be in this range [0, " + length + "]");
}
int indexOfArray = (int) (index / ARRAY_LENGTH);
int indexInArray = (int) (index - (indexOfArray * ARRAY_LENGTH));
arrays.get(indexOfArray)[indexInArray] = value;
}
public t get(long index)
{
if (index >= length || index < 0)
{
throw new IndexOutOfBoundsException("out of the reange of the array, your index must be in this range [0, " + length + "]");
}
int indexOfArray = (int) (index / ARRAY_LENGTH);
int indexInArray = (int) (index - (indexOfArray * ARRAY_LENGTH));
return arrays.get(indexOfArray)[indexInArray];
}
}
and here's the test
public static void main(String[] args)
{
long length = 60085147514l;
BigArray<String> array = new BigArray<>(length, String.class);
array.put("peace be upon you", 1);
array.put("yes it worj", 1755);
String text = array.get(1755);
System.out.println(text + " i am a string comming from an array ");
}
this code is only limited by only Long.MAX_VALUE
and Java heap but you can exceed it as you want (I made it 3800 MB)
i hope this is useful and provide a simple answer
I think the idea of memory-mapping the file (using the CPU's virtual memory hardware) is the right approach. Except that MappedByteBuffer has the same limitation of 2Gb as native arrays. This guy claims to have solved the problem with a pretty simple alternative to MappedByteBuffer:
http://nyeggen.com/post/2014-05-18-memory-mapping-%3E2gb-of-data-in-java/
https://gist.github.com/bnyeggen/c679a5ea6a68503ed19f#file-mmapper-java
Unfortunately the JVM crashes when you read beyond 500Mb.
package com.deans.rtl.util;
import java.io.FileInputStream;
import java.io.IOException;
/**
*
* @author william.deans@gmail.com
*
* Written to work with byte arrays requiring address space larger than 32 bits.
*
*/
public class ByteArray64 {
private final long CHUNK_SIZE = 1024*1024*1024; //1GiB
long size;
byte [][] data;
public ByteArray64( long size ) {
this.size = size;
if( size == 0 ) {
data = null;
} else {
int chunks = (int)(size/CHUNK_SIZE);
int remainder = (int)(size - ((long)chunks)*CHUNK_SIZE);
data = new byte[chunks+(remainder==0?0:1)][];
for( int idx=chunks; --idx>=0; ) {
data[idx] = new byte[(int)CHUNK_SIZE];
}
if( remainder != 0 ) {
data[chunks] = new byte[remainder];
}
}
}
public byte get( long index ) {
if( index<0 || index>=size ) {
throw new IndexOutOfBoundsException("Error attempting to access data element "+index+". Array is "+size+" elements long.");
}
int chunk = (int)(index/CHUNK_SIZE);
int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
return data[chunk][offset];
}
public void set( long index, byte b ) {
if( index<0 || index>=size ) {
throw new IndexOutOfBoundsException("Error attempting to access data element "+index+". Array is "+size+" elements long.");
}
int chunk = (int)(index/CHUNK_SIZE);
int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
data[chunk][offset] = b;
}
/**
* Simulates a single read which fills the entire array via several smaller reads.
*
* @param fileInputStream
* @throws IOException
*/
public void read( FileInputStream fileInputStream ) throws IOException {
if( size == 0 ) {
return;
}
for( int idx=0; idx<data.length; idx++ ) {
if( fileInputStream.read( data[idx] ) != data[idx].length ) {
throw new IOException("short read");
}
}
}
public long size() {
return size;
}
}
}
No, arrays are indexed by int
s (except some versions of JavaCard that use short
s). You will need to slice it up into smaller arrays, probably wrapping in a type that gives you get(long)
, set(long,byte)
, etc. With sections of data that large, you might want to map the file use java.nio.
I suggest you define some "block" objects, each of which holds (say) 1Gb in an array, then make an array of those.