I\'m creating associative arrays to process in a for loop but i\'m getting some strange results in index order. Please take a look at this example script:
#!
Why don't bash associative arrays maintain index order?
Because they are designed not to do this.
Why order of items is changing?
Bash associative array implementation uses a hash library and stores hashes of indexes. These hashes are stored in buckets with 128 default number of buckets. The hash is calculated with the function hash_string() using a simple multiplication and a bitwise XOR. The keys of the associative array are listed in the order buckets appear. Bucket number is calculated by a bitwise AND operation between the hash value of the key and the number of buckets decreased by 1.
I compiled bash commit 6c6454cb18d7cd30b3b26d5ba6479431e599f3ed and for me your script outputs:
$ ./test
o m e d
d1 e2 m3 o4
1d 3m 2e 4o
So I copied the hash_string()
function and written a small C program that would output the bucket number of the keys and compiled and executed:
#include <stdio.h>
#define FNV_OFFSET 2166136261
#define FNV_PRIME 16777619
unsigned int
hash_string (s)
const char *s;
{
register unsigned int i;
for (i = FNV_OFFSET; *s; s++)
{
i *= FNV_PRIME;
i ^= *s;
}
return i;
}
int main() {
const char *s[] = {
"o", "m", "e", "d",
"d1", "e2", "m3", "o4",
"1d", "3m", "2e", "4",
};
for (int i = 0; i < sizeof(s)/sizeof(*s); ++i) {
printf("%3s %3d\n",
s[i],
hash_string(s[i]) & (128 - 1));
}
}
The program outputs two columns, the key and the bucket number of the key (added extra empty lines):
o 112
m 114
e 122
d 123
d1 16
e2 60
m3 69
o4 100
1d 14
3m 41
2e 50
4o 94
The order of keys outputted is sorted using the order of buckets in the hash table they are into, so they are outputted in that order. This is why the order of items changed.
That said, you should not rely on this behaviour, as the output order of keys can change if the author of bash decides to change the hashing function or make any other change.
And how to bypass this behavior?
There is no way to bypass this. Bash arrays use hash table to store the hashes. The insertion order of keys is not stored anywhere.
Of course, you can bypass this behaviour by patching bash
to implement such functionality that you request.
That said, I would just use two arrays:
keys=(d1 e2 m3 o4)
elements=(1w45 2dfg 3df 4df)
declare -A test2
for ((i=0;i<${#keys[@]};++i)); do
test2[${keys[$i]}]="${elements[$i]}"
done
# or maybe something along:
declare -A test2=($(paste -zd <(printf "[%s]=\0" "${keys[@]}") <(printf "%q \0" "${elements[@]}"))
That way you can iterate over keys in the order you inserted them in a separate keys
array.
According to comments this can be done to bypass this behavior.
order=(d1 e2 m3 o4)
declare -A test2=(
[d1]=1w45
[e2]=2dfg
[m3]=3df
[o4]=4df
)
for key in ${order[@]}; { echo $key ${test2[$key]}; }
d1 1w45
e2 2dfg
m3 3df
o4 4df
Or that
declare -A test3=(
[order]="1d 2e 3m 4o"
[1d]=1w45
[2e]=2dfg
[3m]=3df
[4o]=4df
)
for key in ${test3[order]}; { echo $key ${test3[$key]}; }
1d 1w45
2e 2dfg
3m 3df
4o 4df
Is there a better way?
Update, according to accepted answer associative array isn't the right choice if you need a strict order in for loop, better use something like this:
key=(d1 e2 m3 o4 )
val=(1w45 2dfg 3df 4df)
for i in ${!key[@]}; {
echo ${key[$i]} ${val[$i]}
}
Or this
key_val=(
"d1 1w45"
"e2 2dfg"
"m3 3df"
"o4 4df")
for item in "${key_val[@]}"; {
sub=($item)
echo ${sub[0]} ${sub[1]}
}
Or that
keys=(d1 e2 m3 o4 )
d1=1w45 e2=2dfg m3=3df o4=4df
for key in ${keys[@]}; {
echo $key ${!key}
}
Why order of items is changing?
Because generally associative arrays don't naturally maintain insertion orders: tree-based ones use natural (sorted) ordering and hashmaps use wherever their hash function lands the keys (which can be randomised per-process or even per-map for security reasons).
The latter also explains why the order of items can even change as you add new items: not only can new items get inserted between existing ones, when the hashmap has to get resized the entire sequence will get "reshuffled" as the entries are rehashed and moved to their new position.
There are languages which either explicitly add ordering as a feature (generally using a doubly linked list), or use a naturally ordered hashmap, in which case insertion order is maintained, but you can't assume this property holds unless the language guarantees it. Which bash does not.