The TL;DR version, for those who don\'t want the background, is the following specific question:
Why doesn\'t Java have an implemen
If you want a fast implementation of a true multi-dimentional array you could write a custom implementation like this. But you are right... it is not as crisp as the array notation. Although, a neat implementation could be quite friendly.
public class MyArray{
private int rows = 0;
private int cols = 0;
String[] backingArray = null;
public MyArray(int rows, int cols){
this.rows = rows;
this.cols = cols;
backingArray = new String[rows*cols];
}
public String get(int row, int col){
return backingArray[row*cols + col];
}
... setters and other stuff
}
Why is it not the default implementation?
The designers of Java probably had to decide how the default notation of the usual C array syntax would behave. They had a single array notation which could either implement arrays-of-arrays or true multi-dimentional arrays.
I think early Java designers were really concerned with Java being safe. Lot of decisions seem to have been taken to make it difficult for the average programmer(or a good programmer on a bad day) to not mess up something . With true multi-dimensional arrays, it is easier for users to waste large chunks of memory by allocating blocks where they are not useful.
Also, from Java's embedded systems roots, they probably found that it was more likely to find pieces of memory to allocate rather than large chunks of memory required for true multi-dimentional objects.
Of course, the flip side is that places where multi-dimensional arrays really make sense suffer. And you are forced to use a library and messy looking code to get your work done.
Why is it still not included in the language?
Even today, true multi-dimensional arrays are a risk from the the point of view of possible of memory wastage/misuse.
To me it looks like you sort of answered the question yourself:
... an incentive to write it as a flat array, even if that makes the unnatural and hard to read.
So write it as a flat array which is easy to read. With a trivial helper like
double get(int row, int col) {
return data[rowLength * row + col];
}
and similar setter and possibly a +=
-equivalent, you can pretend you're working with a 2D array. It's really no big deal. You can't use the array notation and everything gets verbose and ugly. But that seems to be the Java way. It's exactly the same as with BigInteger
or BigDecimal
. You can't use braces for accessing a Map
, that's a very similar case.
Now the question is how important all those features are? Would more people be happy if they could write x += BigDecimal.valueOf("123456.654321") + 10;
, or spouse["Paul"] = "Mary";
, or use 2D arrays without the boilerplate, or what? All of this would be nice and you could go further, e.g., array slices. But there's no real problem. You have to choose between verbosity and inefficiency as in many other cases. IMHO, the effort spent on this feature can be better spent elsewhere. Your 2D arrays are a new best as....
Java actually has no 2D primitive arrays, ...
it's mostly a syntactic sugar, the underlying thing is array of objects.
double[][] a = new double[1][1];
Object[] b = a;
As arrays are reified, the current implementation needs hardly any support. Your implementation would open a can of worms:
java.lang.reflect.Array
? Clone it for 2D arrays?And what would
??? x = {new int[1], new int[2]};
be? An old-style 2D int[][]
? What about interoperability?
I guess, it's all doable, but there are simpler and more important things missing from Java. Some people need 2D arrays all the time, but many can hardly remember when they used any array at all.
I am unable to reproduce the performance benefits you claim. Specifically, the test program:
public abstract class Benchmark {
final String name;
public Benchmark(String name) {
this.name = name;
}
abstract int run(int iterations) throws Throwable;
private BigDecimal time() {
try {
int nextI = 1;
int i;
long duration;
do {
i = nextI;
long start = System.nanoTime();
run(i);
duration = System.nanoTime() - start;
nextI = (i << 1) | 1;
} while (duration < 1000000000 && nextI > 0);
return new BigDecimal((duration) * 1000 / i).movePointLeft(3);
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
@Override
public String toString() {
return name + "\t" + time() + " ns";
}
public static void main(String[] args) throws Exception {
final int[] flat = new int[100*100*100];
final int[][][] multi = new int[100][100][100];
Random chaos = new Random();
for (int i = 0; i < flat.length; i++) {
flat[i] = chaos.nextInt();
}
for (int i=0; i<multi.length; i++)
for (int j=0; j<multi[0].length; j++)
for (int k=0; k<multi[0][0].length; k++)
multi[i][j][k] = chaos.nextInt();
Benchmark[] marks = {
new Benchmark("flat") {
@Override
int run(int iterations) throws Throwable {
long total = 0;
for (int j = 0; j < iterations; j++)
for (int i = 0; i < flat.length; i++)
total += flat[i];
return (int) total;
}
},
new Benchmark("multi") {
@Override
int run(int iterations) throws Throwable {
long total = 0;
for (int iter = 0; iter < iterations; iter++)
for (int i=0; i<multi.length; i++)
for (int j=0; j<multi[0].length; j++)
for (int k=0; k<multi[0][0].length; k++)
total+=multi[i][j][k];
return (int) total;
}
},
new Benchmark("multi (idiomatic)") {
@Override
int run(int iterations) throws Throwable {
long total = 0;
for (int iter = 0; iter < iterations; iter++)
for (int[][] a : multi)
for (int[] b : a)
for (int c : b)
total += c;
return (int) total;
}
}
};
for (Benchmark mark : marks) {
System.out.println(mark);
}
}
}
run on my workstation with
java version "1.8.0_05" Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)
prints
flat 264360.217 ns multi 270303.246 ns multi (idiomatic) 266607.334 ns
That is, we observe a mere 3% difference between the one-dimensional and the multi-dimensional code you provided. This difference drops to 1% if we use idiomatic Java (specifically, an enhanced for loop) for traversal (probably because bounds checking is performed on the same array object the loop dereferences, enabling the just in time compiler to elide bounds checking more completely).
Performance therefore seems an inadequate justification for increasing the complexity of the language. Specifically, to support true multi dimensional arrays, the Java programming language would have to distinguish between arrays of arrays, and multidimensional arrays. Likewise, programmers would have to distinguish between them, and be aware of their differences. API designers would have to ponder whether to use an array of arrays, or a multidimensional array. The compiler, class file format, class file verifier, interpreter, and just in time compiler would have to be extended. This would be particularly difficult, because multidimensional arrays of different dimension counts would have an incompatible memory layout (because the size of their dimensions must be stored to enable bounds checking), and can therefore not be subtypes of each other. As a consequence, the methods of class java.util.Arrays would likely have to be duplicated for each dimension count, as would all otherwise polymorphic algorithms working with arrays.
To conclude, extending Java to support multidimensional arrays would offer negligible performance gain for most programs, but require non-trivial extensions to its type system, compiler and runtime environment. Introducing them would therefore have been at odds with the design goals of the Java programming language, specifically that it be simple.
but it seems that this is really not what one might have expected.
Why?
Consider that the form T[]
means "array of type T", then just as we would expect int[]
to mean "array of type int", we would expect int[][]
to mean "array of type array of type int", because there's no less reason for having int[]
as the T
than int
.
As such, considering that one can have arrays of any type, it follows just from the way [
and ]
are used in declaring and initialising arrays (and for that matter, {
, }
and ,
), that without some sort of special rule banning arrays of arrays, we get this sort of use "for free".
Now consider also that there are things we can do with jagged arrays that we can't do otherwise:
lookup[1]
is the same array as lookup[5]
. (This can allow for massive savings with some data-sets, e.g. many Unicode properties can be mapped for the full set of 1,112,064 code points in a small amount of memory because leaf arrays of properties can be repeated for ranges with matching patterns).There are certainly cases where these sort of multi-dimensional arrays are useful.
Now, the default state of any feature is unspecified and unimplemented. Someone needs to decide to specify and implement a feature, or else it wouldn't exist.
Since, as shown above, the array-of-array sort of multidimensional array will exist unless someone decided to introduce a special banning array-of-array feature. Since arrays of arrays are useful for the reasons above, that would be a strange decision to make.
Conversely, the sort of multidimensional array where an array has a defined rank that can be greater than 1 and so be used with a set of indices rather than a single index, does not follow naturally from what is already defined. Someone would need to:
Also users would have to learn this new feature.
So, it has to be worth it. Some things that would make it worth it would be:
In this case though:
Really, the question is not "why doesn't Java have true multidimensional arrays"? But "Why should it?"
Of course, the points you made in favour of multidimensional arrays are valid, and some languages do have them for that reason, but the burden is nonetheless to argue a feature in, not argue it out.
(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)
Like many rumours, there's an element of truth here, but it is not the full truth.
.NET arrays can indeed have multiple ranks. This is not the only way in which it is more flexible than Java. Each rank can also have a lower-bound other than zero. As such, you could for example have an array that goes from -3 to 42 or a two dimensional array where one rank goes from -2 to 5 and another from 57 to 100, or whatever.
C# does not give complete access to all of this from its built-in syntax (you need to call Array.CreateInstance()
for lower bounds other than zero), but it does for allow you to use the syntax int[,]
for a two-dimensional array of int
, int[,,]
for a three-dimensional array, and so on.
Now, the extra work involved in dealing with lower bounds other than zero adds a performance burden, and yet these cases are relatively uncommon. For that reason single-rank arrays with a lower-bound of 0 are treated as a special case with a more performant implementation. Indeed, they are internally a different sort of structure.
In .NET multi-dimensional arrays with lower bounds of zero are treated as multi-dimensional arrays whose lower bounds just happen to be zero (that is, as an example of the slower case) rather than the faster case being able to handle ranks greater than 1.
Of course, .NET could have had a fast-path case for zero-based multi-dimensional arrays, but then all the reasons for Java not having them apply and the fact that there's already one special case, and special cases suck, and then there would be two special cases and they would suck more. (As it is, one can have some issues with trying to assign a value of one type to a variable of the other type).
Not a single thing above shows clearly that Java couldn't possibly have had the sort of multi-dimensional array you talk of; it would have been a sensible enough decision, but so also the decision that was made was also sensible.
Since this question is to a great extent about performance, let me contribute a proper JMH-based benchmark. I have also changed some things to make your example both simpler and the performance edge more prominent.
In my case I compare a 1D array with a 2D-array, and use a very short inner dimension. This is the worst case for the cache.
I have tried with both long
and int
accumulator and saw no difference between them. I submit the version with int
.
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(X*Y)
@Warmup(iterations = 30, time = 100, timeUnit=MILLISECONDS)
@Measurement(iterations = 5, time = 1000, timeUnit=MILLISECONDS)
@State(Scope.Thread)
@Threads(1)
@Fork(1)
public class Measure
{
static final int X = 100_000, Y = 10;
private final int[] single = new int[X*Y];
private final int[][] multi = new int[X][Y];
@Setup public void setup() {
final ThreadLocalRandom rnd = ThreadLocalRandom.current();
for (int i=0; i<single.length; i++) single[i] = rnd.nextInt();
for (int i=0; i<multi.length; i++)
for (int j=0; j<multi[0].length; j++)
multi[i][j] = rnd.nextInt();
}
@Benchmark public long sumSingle() { return sumSingle(single); }
@Benchmark public long sumMulti() { return sumMulti(multi); }
public static long sumSingle(int[] arr) {
int total = 0;
for (int i=0; i<arr.length; i++)
total+=arr[i];
return total;
}
public static long sumMulti(int[][] arr) {
int total = 0;
for (int i=0; i<arr.length; i++)
for (int j=0; j<arr[0].length; j++)
total+=arr[i][j];
return total;
}
}
The difference in performance is larger than what you have measured:
Benchmark Mode Samples Score Score error Units
o.s.Measure.sumMulti avgt 5 1,356 0,121 ns/op
o.s.Measure.sumSingle avgt 5 0,421 0,018 ns/op
That's a factor above three. (Note that the timing is reported per array element.)
I also note that there is no warmup involved: the first 100 ms are as fast as the rest. Apparently this is such a simple task that the interpreter already does all it takes to make it optimal.
Changing sumMulti
's inner loop to
for (int j=0; j<arr[i].length; j++)
total+=arr[i][j];
(note arr[i].length
) resulted in a significant speedup, as predicted by maaartinus. Using arr[0].length
makes it impossible to eliminate the index range check. Now the results are as follows:
Benchmark Mode Samples Score Error Units
o.s.Measure.sumMulti avgt 5 0,992 ± 0,066 ns/op
o.s.Measure.sumSingle avgt 5 0,424 ± 0,046 ns/op
This should be a question to James Gosling, I suppose. The initial design of Java was about OOP and simplicity, not about speed.
If you have a better idea of how multidimensional arrays should work, there are several ways of bringing it to life:
UPD. Of course, you are not the first to question the problems of Java arrays design.
For instance, projects Sumatra and Panama would also benefit from true multidimensional arrays.
"Arrays 2.0" is John Rose's talk on this subject at JVM Language Summit 2012.