Array slicing in Ruby: explanation for illogical behaviour (taken from Rubykoans.com)

后端 未结 10 2235
没有蜡笔的小新
没有蜡笔的小新 2020-11-22 10:39

I was going through the exercises in Ruby Koans and I was struck by the following Ruby quirk that I found really unexplainable:

array = [:peanut, :butter, :a         


        
相关标签:
10条回答
  • 2020-11-22 10:49

    I agree that this seems like strange behavior, but even the official documentation on Array#slice demonstrates the same behavior as in your example, in the "special cases" below:

       a = [ "a", "b", "c", "d", "e" ]
       a[2] +  a[0] + a[1]    #=> "cab"
       a[6]                   #=> nil
       a[1, 2]                #=> [ "b", "c" ]
       a[1..3]                #=> [ "b", "c", "d" ]
       a[4..7]                #=> [ "e" ]
       a[6..10]               #=> nil
       a[-3, 3]               #=> [ "c", "d", "e" ]
       # special cases
       a[5]                   #=> nil
       a[5, 1]                #=> []
       a[5..10]               #=> []
    

    Unfortunately, even their description of Array#slice doesn't seem to offer any insight as to why it works this way:

    Element Reference—Returns the element at index, or returns a subarray starting at start and continuing for length elements, or returns a subarray specified by range. Negative indices count backward from the end of the array (-1 is the last element). Returns nil if the index (or starting index) are out of range.

    0 讨论(0)
  • 2020-11-22 10:53

    At least note that the behavior is consistent. From 5 on up everything acts the same; the weirdness only occurs at [4,N].

    Maybe this pattern helps, or maybe I'm just tired and it doesn't help at all.

    array[0,4] => [:peanut, :butter, :and, :jelly]
    array[1,3] => [:butter, :and, :jelly]
    array[2,2] => [:and, :jelly]
    array[3,1] => [:jelly]
    array[4,0] => []
    

    At [4,0], we catch the end of the array. I'd actually find it rather odd, as far as beauty in patterns go, if the last one returned nil. Because of a context like this, 4 is an acceptable option for the first parameter so that the empty array can be returned. Once we hit 5 and up, though, the method likely exits immediately by nature of being totally and completely out of bounds.

    0 讨论(0)
  • 2020-11-22 10:55

    I found explanation by Gary Wright very helpful as well. http://www.ruby-forum.com/topic/1393096#990065

    The answer by Gary Wright is -

    http://www.ruby-doc.org/core/classes/Array.html

    The docs certainly could be more clear but the actual behavior is self-consistent and useful. Note: I'm assuming 1.9.X version of String.

    It helps to consider the numbering in the following way:

      -4  -3  -2  -1    <-- numbering for single argument indexing
       0   1   2   3
     +---+---+---+---+
     | a | b | c | d |
     +---+---+---+---+
     0   1   2   3   4  <-- numbering for two argument indexing or start of range
    -4  -3  -2  -1
    

    The common (and understandable) mistake is too assume that the semantics of the single argument index are the same as the semantics of the first argument in the two argument scenario (or range). They are not the same thing in practice and the documentation doesn't reflect this. The error though is definitely in the documentation and not in the implementation:

    single argument: the index represents a single character position within the string. The result is either the single character string found at the index or nil because there is no character at the given index.

      s = ""
      s[0]    # nil because no character at that position
    
      s = "abcd"
      s[0]    # "a"
      s[-4]   # "a"
      s[-5]   # nil, no characters before the first one
    

    two integer arguments: the arguments identify a portion of the string to extract or to replace. In particular, zero-width portions of the string can also be identified so that text can be inserted before or after existing characters including at the front or end of the string. In this case, the first argument does not identify a character position but instead identifies the space between characters as shown in the diagram above. The second argument is the length, which can be 0.

    s = "abcd"   # each example below assumes s is reset to "abcd"
    
    To insert text before 'a':   s[0,0] = "X"           #  "Xabcd"
    To insert text after 'd':    s[4,0] = "Z"           #  "abcdZ"
    To replace first two characters: s[0,2] = "AB"      #  "ABcd"
    To replace last two characters:  s[-2,2] = "CD"     #  "abCD"
    To replace middle two characters: s[1..3] = "XX"    #  "aXXd"
    

    The behavior of a range is pretty interesting. The starting point is the same as the first argument when two arguments are provided (as described above) but the end point of the range can be the 'character position' as with single indexing or the "edge position" as with two integer arguments. The difference is determined by whether the double-dot range or triple-dot range is used:

    s = "abcd"
    s[1..1]           # "b"
    s[1..1] = "X"     # "aXcd"
    
    s[1...1]          # ""
    s[1...1] = "X"    # "aXbcd", the range specifies a zero-width portion of
    the string
    
    s[1..3]           # "bcd"
    s[1..3] = "X"     # "aX",  positions 1, 2, and 3 are replaced.
    
    s[1...3]          # "bc"
    s[1...3] = "X"    # "aXd", positions 1, 2, but not quite 3 are replaced.
    

    If you go back through these examples and insist and using the single index semantics for the double or range indexing examples you'll just get confused. You've got to use the alternate numbering I show in the ascii diagram to model the actual behavior.

    0 讨论(0)
  • 2020-11-22 11:01

    this has to do with the fact that slice returns an array, relevant source documentation from Array#slice:

     *  call-seq:
     *     array[index]                -> obj      or nil
     *     array[start, length]        -> an_array or nil
     *     array[range]                -> an_array or nil
     *     array.slice(index)          -> obj      or nil
     *     array.slice(start, length)  -> an_array or nil
     *     array.slice(range)          -> an_array or nil
    

    which suggests to me that if you give the start that is out of bounds, it will return nil, thus in your example array[4,0] asks for the 4th element that exists, but asks to return an array of zero elements. While array[5,0] asks for an index out of bounds so it returns nil. This perhaps makes more sense if you remember that the slice method is returning a new array, not altering the original data structure.

    EDIT:

    After reviewing the comments I decided to edit this answer. Slice calls the following code snippet when the arg value is two:

    if (argc == 2) {
        if (SYMBOL_P(argv[0])) {
            rb_raise(rb_eTypeError, "Symbol as array index");
        }
        beg = NUM2LONG(argv[0]);
        len = NUM2LONG(argv[1]);
        if (beg < 0) {
            beg += RARRAY(ary)->len;
        }
        return rb_ary_subseq(ary, beg, len);
    }
    

    if you look in the array.c class where the rb_ary_subseq method is defined, you see that it is returning nil if the length is out of bounds, not the index:

    if (beg > RARRAY_LEN(ary)) return Qnil;
    

    In this case this is what is happening when 4 is passed in, it checks that there are 4 elements and thus does not trigger the nil return. It then goes on and returns an empty array if the second arg is set to zero. while if 5 is passed in, there are not 5 elements in the array, so it returns nil before the zero arg is evaluated. code here at line 944.

    I believe this to be a bug, or at least unpredictable and not the 'Principle of Least Surprise'. When I get a few minutes I will a least submit a failing test patch to ruby core.

    0 讨论(0)
  • 2020-11-22 11:04

    An explanation provided by Jim Weirich

    One way to think about it is that index position 4 is at the very edge of the array. When asking for a slice, you return as much of the array that is left. So consider the array[2,10], array[3,10] and array[4,10] ... each returns the remaining bits of the end of the array: 2 elements, 1 element and 0 elements respectively. However, position 5 is clearly outside the array and not at the edge, so array[5,10] returns nil.

    0 讨论(0)
  • 2020-11-22 11:07

    This makes sense when you consider than an array slice can be a valid lvalue, not just an rvalue:

    array = [:peanut, :butter, :and, :jelly]
    # replace 0 elements starting at index 5 (insert at end or array):
    array[4,0] = [:sandwich]
    # replace 0 elements starting at index 0 (insert at head of array):
    array[0,0] = [:make, :me, :a]
    # array is [:make, :me, :a, :peanut, :butter, :and, :jelly, :sandwich]
    
    # this is just like replacing existing elements:
    array[3, 4] = [:grilled, :cheese]
    # array is [:make, :me, :a, :grilled, :cheese, :sandwich]
    

    This wouldn't be possible if array[4,0] returned nil instead of []. However, array[5,0] returns nil because it's out of bounds (inserting after the 4th element of a 4-element array is meaningful, but inserting after the 5th element of a 4 element array is not).

    Read the slice syntax array[x,y] as "starting after x elements in array, select up to y elements". This is only meaningful if array has at least x elements.

    0 讨论(0)
提交回复
热议问题