i see you are running the following loop
for(int i = 0; i < 1024 * 1024; i++){
for(int x = 0; x < 1024; x++){
var += arr[x];
}
}
twice in the Java code; while once in the c++ code;
this might bring a caches warmup which makes the Java code finally execute faster than the C++.