Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([])

前端 未结 4 1867
情歌与酒
情歌与酒 2020-11-21 11:31

Consider this code:

h = Hash.new(0)  # New hash pairs will by default have 0 as values
h[1] += 1  #=> {1=>1}
h[2] += 2  #=> {2=>2}
相关标签:
4条回答
  • 2020-11-21 11:54

    First, note that this behavior applies to any default value that is subsequently mutated (e.g. hashes and strings), not just arrays.

    TL;DR: Use Hash.new { |h, k| h[k] = [] } if you want the most idiomatic solution and don’t care why.


    What doesn’t work

    Why Hash.new([]) doesn’t work

    Let’s look more in-depth at why Hash.new([]) doesn’t work:

    h = Hash.new([])
    h[0] << 'a'  #=> ["a"]
    h[1] << 'b'  #=> ["a", "b"]
    h[1]         #=> ["a", "b"]
    
    h[0].object_id == h[1].object_id  #=> true
    h  #=> {}
    

    We can see that our default object is being reused and mutated (this is because it is passed as the one and only default value, the hash has no way of getting a fresh, new default value), but why are there no keys or values in the array, despite h[1] still giving us a value? Here’s a hint:

    h[42]  #=> ["a", "b"]
    

    The array returned by each [] call is just the default value, which we’ve been mutating all this time so now contains our new values. Since << doesn’t assign to the hash (there can never be assignment in Ruby without an = present), we’ve never put anything into our actual hash. Instead we have to use <<= (which is to << as += is to +):

    h[2] <<= 'c'  #=> ["a", "b", "c"]
    h             #=> {2=>["a", "b", "c"]}
    

    This is the same as:

    h[2] = (h[2] << 'c')
    

    Why Hash.new { [] } doesn’t work

    Using Hash.new { [] } solves the problem of reusing and mutating the original default value (as the block given is called each time, returning a new array), but not the assignment problem:

    h = Hash.new { [] }
    h[0] << 'a'   #=> ["a"]
    h[1] <<= 'b'  #=> ["b"]
    h             #=> {1=>["b"]}
    

    What does work

    The assignment way

    If we remember to always use <<=, then Hash.new { [] } is a viable solution, but it’s a bit odd and non-idiomatic (I’ve never seen <<= used in the wild). It’s also prone to subtle bugs if << is inadvertently used.

    The mutable way

    The documentation for Hash.new states (emphasis my own):

    If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.

    So we must store the default value in the hash from within the block if we wish to use << instead of <<=:

    h = Hash.new { |h, k| h[k] = [] }
    h[0] << 'a'  #=> ["a"]
    h[1] << 'b'  #=> ["b"]
    h            #=> {0=>["a"], 1=>["b"]}
    

    This effectively moves the assignment from our individual calls (which would use <<=) to the block passed to Hash.new, removing the burden of unexpected behavior when using <<.

    Note that there is one functional difference between this method and the others: this way assigns the default value upon reading (as the assignment always happens inside the block). For example:

    h1 = Hash.new { |h, k| h[k] = [] }
    h1[:x]
    h1  #=> {:x=>[]}
    
    h2 = Hash.new { [] }
    h2[:x]
    h2  #=> {}
    

    The immutable way

    You may be wondering why Hash.new([]) doesn’t work while Hash.new(0) works just fine. The key is that Numerics in Ruby are immutable, so we naturally never end up mutating them in-place. If we treated our default value as immutable, we could use Hash.new([]) just fine too:

    h = Hash.new([].freeze)
    h[0] += ['a']  #=> ["a"]
    h[1] += ['b']  #=> ["b"]
    h[2]           #=> []
    h              #=> {0=>["a"], 1=>["b"]}
    

    However, note that ([].freeze + [].freeze).frozen? == false. So, if you want to ensure that the immutability is preserved throughout, then you must take care to re-freeze the new object.


    Conclusion

    Of all the ways, I personally prefer “the immutable way”—immutability generally makes reasoning about things much simpler. It is, after all, the only method that has no possibility of hidden or subtle unexpected behavior. However, the most common and idiomatic way is “the mutable way”.

    As a final aside, this behavior of Hash default values is noted in Ruby Koans.


    This isn’t strictly true, methods like instance_variable_set bypass this, but they must exist for metaprogramming since the l-value in = cannot be dynamic.

    0 讨论(0)
  • 2020-11-21 12:06

    When you write,

    h = Hash.new([])
    

    you pass default reference of array to all elements in hash. because of that all elements in hash refers same array.

    if you want each element in hash refer to separate array, you should use

    h = Hash.new{[]} 
    

    for more detail of how it works in ruby please go through this: http://ruby-doc.org/core-2.2.0/Array.html#method-c-new

    0 讨论(0)
  • 2020-11-21 12:14

    The operator += when applied to those hashes work as expected.

    [1] pry(main)> foo = Hash.new( [] )
    => {}
    [2] pry(main)> foo[1]+=[1]
    => [1]
    [3] pry(main)> foo[2]+=[2]
    => [2]
    [4] pry(main)> foo
    => {1=>[1], 2=>[2]}
    [5] pry(main)> bar = Hash.new { [] }
    => {}
    [6] pry(main)> bar[1]+=[1]
    => [1]
    [7] pry(main)> bar[2]+=[2]
    => [2]
    [8] pry(main)> bar
    => {1=>[1], 2=>[2]}
    

    This may be because foo[bar]+=baz is syntactic sugar for foo[bar]=foo[bar]+baz when foo[bar] on the right hand of = is evaluated it returns the default value object and the + operator will not change it. The left hand is syntactic sugar for the []= method which won't change the default value.

    Note that this doesn't apply to foo[bar]<<=bazas it'll be equivalent to foo[bar]=foo[bar]<<baz and << will change the default value.

    Also, I found no difference between Hash.new{[]} and Hash.new{|hash, key| hash[key]=[];}. At least on ruby 2.1.2 .

    0 讨论(0)
  • 2020-11-21 12:16

    You're specifying that the default value for the hash is a reference to that particular (initially empty) array.

    I think you want:

    h = Hash.new { |hash, key| hash[key] = []; }
    h[1]<<=1 
    h[2]<<=2 
    

    That sets the default value for each key to a new array.

    0 讨论(0)
提交回复
热议问题