I have three questions about three nested loops:
for (int x=0; x<400; x++)
{
for (int y=0; y<300; y++)
{
for (int z=0; z<400; z++)
Have you tried this:
Object[][][] store = new Object[ 400 ][300][400];
for (int x=0; x<400; x++)
{
Object[][] matrix = store[x];
for (int y=0; y<300; y++)
{
Object[] line = matrix[y];
for (int z=0; z<400; z++)
{
// compute and store value
line[z] = // result;
}
}
}
it might improve your cache thrashing.
Considering that the array is huge, the amount of memory used, the indirections needed (a multidimensional array is a arrays of reference to arrays...), that does not seem at all slow to me. When you switch x and z you are probably trashing the cache.
For comparison, you could store everything in a flat array.... This would improve the storing speed... but then the retrieval would be more complex and much more slow.
int k = 0;
for (int x=0; x<400; x++)
{
for (int y=0; y<300; y++)
{
for (int z=0; z<400; z++)
{
// compute and store value
arr[k++] = val;
}
}
}
1) Why is an array so relatively slow?
As others pointed, you are comparing apples to oranges. The triple-array is slow because it needs to dereference (internally at least - yes, "there are no pointers in Java") three times; but then again, you cannot reference a single integer variable...
2) And why does it get even slower when I put this in the inner loop:
values[z][y][x] = 1; // (notice x and z switched)
Because you have decreased cache coherence. The fastest-changing indices should be the last ones, so that most memory accesses occur next to each other, within the same cache blocks, instead of forcing your processor to wait until the blocks are read from the main RAM.
3) Most importantly: Can I use a data structure that is as quick as the assignment of a single integer, but can store as much data as the 3D-array?
No. There is no such structure, since the integer variable fits into a machine register (quicker even than the processor's memory cache), and can always be accessed faster than anything else you care to mention. Processor speeds are much, much faster then main memory speeds. If your 'working set' (the data that you need to operate on) does not fit into registers or cache, you will have to pay a penalty to fetch it from RAM (or even worse, disk).
This being said, Java does boundary checks on each array access, and does not seem to be too smart about optimizing the boundary checks away. The following comparison may be of interest:
public static long test1(int[][][] array) {
long start = System.currentTimeMillis();
for ( int x = 0; x < 400; x++ ) {
for ( int y = 0; y < 300; y++ ) {
for ( int z = 0; z < 400; z++ ) {
array[x][y][z] = x + y + z;
}
}
}
return System.currentTimeMillis() - start;
}
public static long test2(int [] array) {
long start = System.currentTimeMillis();
for ( int x = 0; x < 400; x++ ) {
for ( int y = 0; y < 300; y++ ) {
for ( int z = 0; z < 400; z++ ) {
array[z + y*400 + x*400*300] = x + y + z;
}
}
}
return System.currentTimeMillis() - start;
}
public static void main(String[] args) {
int[][][] a1 = new int[400][300][400];
int[] a2 = new int[400*300*400];
int n = 20;
System.err.println("test1");
for (int i=0; i<n; i++) {
System.err.print(test1(a1) + "ms ");
}
System.err.println();
System.err.println("test2");
for (int i=0; i<n; i++) {
System.err.print(test2(a2) + "ms ");
}
System.err.println();
}
The output, on my system, is
test1
164ms 177ms 148ms 149ms 148ms 147ms 150ms 151ms 152ms 154ms 151ms 150ms 148ms 148ms 150ms 148ms 150ms 148ms 148ms 149ms
test2
141ms 153ms 130ms 130ms 130ms 133ms 130ms 130ms 130ms 132ms 129ms 131ms 130ms 131ms 131ms 130ms 131ms 130ms 130ms 130ms
Therefore, there is some room for improvement... but I really don't think it is worth your while.
public static void main( String[] args ) {
int[][][] storage = new int[ 400 ][ 300 ][ 400 ];
long start = System.currentTimeMillis();
for ( int x = 0; x < 400; x++ ) {
for ( int y = 0; y < 300; y++ ) {
for ( int z = 0; z < 400; z++ ) {
storage[x][y][z] = 5;
}
}
}
long end = System.currentTimeMillis();
System.out.println( "Time was: " + ( end - start ) / 1000.0 + " seconds." );
}
Ran with -Xmx1g
Time was: 0.188 seconds.
That seems pretty darn fast.. you are looking at 48 MILLION elements in the innermost loop.
Homerolling a silly little datastructure..
public static void main( String[] args ) {
StorerGuy[] storerGuys = new StorerGuy[ 400 ];
long start = System.currentTimeMillis();
for ( int x = 0; x < 400; x++ ) {
for ( int y = 0; y < 300; y++ ) {
for ( int z = 0; z < 400; z++ ) {
storerGuys[x] = new StorerGuy( x, y, z, 5 );
}
}
}
long end = System.currentTimeMillis();
System.out.println( "Time was: " + ( end - start ) / 1000.0 + " seconds." );
}
public static class StorerGuy {
public int x;
public int y;
public int z;
public int value;
StorerGuy( int x, int y, int z, int value ) {
this.x = x;
this.y = y;
this.z = z;
this.value = value;
}
}
Time was: 0.925 seconds.
Which is faster than 4 seconds you had in your mixed up order example.
I think multi arrays are too much for the problem. You are better off with a more complex data structure, as it will keep the stuff all in 1 memory location (x,y,z,value).
Java is an OO language. In most cases, you should use objects and not weird data structures like int[][][]
I would guess that this has alot to do with caching and registers and the principle of memory locality.
Java has to access thousands of more bytes of memory when storing into an array. With the single variable, it can just keep that value in the cache and just keep updating it.
The cache isn't large enough to hold the whole multi-dimensional array, so Java has to keep updating the cache to and from memory. Cache access times are way faster than memory access times.
I don't even see why you would do this test though. If you need to store lots of data in a multi-dimensional array, using a single variable isn't helpful, even if it is faster.
Also, the reason that when the parameters are switched around when accessing the array is because you are jumping around in memory a lot more (lots more cache misses) than when you are just iterating the other way.