Why don't Bash associative arrays maintain index order?

前端 未结 3 885
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-14 06:28

I\'m creating associative arrays to process in a for loop but i\'m getting some strange results in index order. Please take a look at this example script:

#!         


        
相关标签:
3条回答
  • 2021-01-14 06:31

    Why don't bash associative arrays maintain index order?

    Because they are designed not to do this.

    Why order of items is changing?

    Bash associative array implementation uses a hash library and stores hashes of indexes. These hashes are stored in buckets with 128 default number of buckets. The hash is calculated with the function hash_string() using a simple multiplication and a bitwise XOR. The keys of the associative array are listed in the order buckets appear. Bucket number is calculated by a bitwise AND operation between the hash value of the key and the number of buckets decreased by 1.

    I compiled bash commit 6c6454cb18d7cd30b3b26d5ba6479431e599f3ed and for me your script outputs:

    $ ./test 
    o m e d
    d1 e2 m3 o4
    1d 3m 2e 4o
    

    So I copied the hash_string() function and written a small C program that would output the bucket number of the keys and compiled and executed:

    #include <stdio.h>
    
    #define FNV_OFFSET 2166136261
    #define FNV_PRIME 16777619
    
    unsigned int
    hash_string (s)
         const char *s;
    {
      register unsigned int i;
    
      for (i = FNV_OFFSET; *s; s++)
        {
          i *= FNV_PRIME;
          i ^= *s;
        }
    
      return i;
    }
    
    int main() {
        const char *s[] = {
            "o", "m", "e", "d",
            "d1", "e2", "m3", "o4",
            "1d", "3m", "2e", "4",
        };
        for (int i = 0;  i < sizeof(s)/sizeof(*s); ++i) {
            printf("%3s %3d\n",
                s[i], 
                hash_string(s[i]) & (128 - 1));
        }
    }
    

    The program outputs two columns, the key and the bucket number of the key (added extra empty lines):

      o 112
      m 114
      e 122
      d 123
    
     d1  16
     e2  60
     m3  69
     o4 100
    
     1d  14
     3m  41
     2e  50
     4o  94
    

    The order of keys outputted is sorted using the order of buckets in the hash table they are into, so they are outputted in that order. This is why the order of items changed.

    That said, you should not rely on this behaviour, as the output order of keys can change if the author of bash decides to change the hashing function or make any other change.

    And how to bypass this behavior?

    There is no way to bypass this. Bash arrays use hash table to store the hashes. The insertion order of keys is not stored anywhere.

    Of course, you can bypass this behaviour by patching bash to implement such functionality that you request.

    That said, I would just use two arrays:

    keys=(d1 e2 m3 o4)
    elements=(1w45 2dfg 3df 4df)
    declare -A test2
    for ((i=0;i<${#keys[@]};++i)); do
        test2[${keys[$i]}]="${elements[$i]}"
    done
    # or maybe something along:
    declare -A test2=($(paste -zd <(printf "[%s]=\0" "${keys[@]}") <(printf "%q \0" "${elements[@]}"))
    

    That way you can iterate over keys in the order you inserted them in a separate keys array.

    0 讨论(0)
  • 2021-01-14 06:35

    According to comments this can be done to bypass this behavior.

    order=(d1 e2 m3 o4)
    declare -A test2=(
        [d1]=1w45
        [e2]=2dfg
        [m3]=3df
        [o4]=4df
    )
    for key in ${order[@]}; { echo $key ${test2[$key]}; }
    
    d1 1w45
    e2 2dfg
    m3 3df
    o4 4df
    

    Or that

    declare -A test3=(
        [order]="1d 2e 3m 4o"
        [1d]=1w45
        [2e]=2dfg
        [3m]=3df
        [4o]=4df
    )
    for key in ${test3[order]}; { echo $key ${test3[$key]}; }
    
    1d 1w45
    2e 2dfg
    3m 3df
    4o 4df
    

    Is there a better way?

    Update, according to accepted answer associative array isn't the right choice if you need a strict order in for loop, better use something like this:

    key=(d1   e2   m3  o4 ) 
    val=(1w45 2dfg 3df 4df)
    
    for i in ${!key[@]}; {
        echo ${key[$i]} ${val[$i]}
    }
    

    Or this

    key_val=(
        "d1 1w45"
        "e2 2dfg"
        "m3 3df"
        "o4 4df")  
    
    for item in "${key_val[@]}"; {
        sub=($item)
        echo ${sub[0]} ${sub[1]}
    }
    

    Or that

    keys=(d1      e2      m3     o4    )
          d1=1w45 e2=2dfg m3=3df o4=4df
    
    for key in ${keys[@]}; {
        echo $key ${!key}
    }
    
    0 讨论(0)
  • 2021-01-14 06:50

    Why order of items is changing?

    Because generally associative arrays don't naturally maintain insertion orders: tree-based ones use natural (sorted) ordering and hashmaps use wherever their hash function lands the keys (which can be randomised per-process or even per-map for security reasons).

    The latter also explains why the order of items can even change as you add new items: not only can new items get inserted between existing ones, when the hashmap has to get resized the entire sequence will get "reshuffled" as the entries are rehashed and moved to their new position.

    There are languages which either explicitly add ordering as a feature (generally using a doubly linked list), or use a naturally ordered hashmap, in which case insertion order is maintained, but you can't assume this property holds unless the language guarantees it. Which bash does not.

    0 讨论(0)
提交回复
热议问题