Pythonic way to determine whether not null list entries are 'continuous'

后端 未结 11 1199
渐次进展
渐次进展 2021-02-01 01:35

I\'m looking for a way to easily determine if all not None items in a list occur in a single continuous slice. I\'ll use integers as examples of not None items

相关标签:
11条回答
  • 2021-02-01 02:10

    My first approach was to use variables to keep track ...

    ...this ends up with a highly nested and very difficult to follow series of if/else statements embedded in a for loop...

    No! Actually you need only one variable. Thinking this problem in the view of Finite State Machine(FSM) with your approach will lead to a quite nice solution.

    We call the state p. At first, p is 0. Then we start walking between the states.

    FSM

    When all the elements in the list is examinated and still don't fail then the answer is True.

    One version that encode the translation table in a dict

    def contiguous(s, _D={(0,0):0, (0,1):1, (1,0):2, (1,1):1, (2,0):2, (2,1):3}):
        p = 0
        for x in s:
            p = _D[p, int(x is not None)]
            if p >= 3: return False
        return True
    

    Another version that use if statement:

    def contiguous(s):
        p = 0
        for x in s:
            if x is None and p == 1 or x is not None and (p == 0 or p == 2):
                p += 1
            if p >= 3: return False
        return True
    

    So my point is that using if and for are still pythonic.

    update

    I found another way to encode the FSM. We can pack the translation table into a 12bit integer.

    def contiguous(s):
        p = 0
        for x in s:
            p = (3684 >> (4*p + 2*(x!=None))) & 3
            if p >= 3: return False
        return True
    

    Here 3684, the magic number, can be obtained by:

        _D[p,a]     3  2  1  2  1  0
             p      2  2  1  1  0  0
             a      1  0  1  0  1  0
    bin(3684) = 0b 11 10 01 10 01 00 
    

    The readability is not as good as other version but it's faster since it avoids dictionary lookup. The second version is as fast as this but this encoding idea can be generalized to solve more problems.

    0 讨论(0)
  • 2021-02-01 02:11

    Here's a way just using numpy :

    a = np.array([1, 2, 3, np.nan, 4, 5, np.nan, 6, 7])
    
    # This returns indices of nans
    # eg. [[3], [6]]
    # use .squeeze() to convert to [3, 6]
    aa = np.argwhere(a != a).squeeze()
    
    # use a diff on your array , if the nans
    # are continuous, the diff will always be 1
    # if not, diff will be > 1 , and using any() will return True
    any(np.diff(aa) > 1) 
    
    0 讨论(0)
  • 2021-02-01 02:13
    def contiguous(seq):
        seq = iter(seq)
        all(x is None for x in seq)        # Burn through any Nones at the beginning
        any(x is None for x in seq)        # and the first group
        return all(x is None for x in seq) # everthing else (if any) should be None.
    

    Here are a couple of examples. You can use next(seq) to get the next item from an iterator. I'll put a mark pointing to the next item after each

    example1:

    seq = iter([None, 1, 2, 3, None])        #  [None, 1, 2, 3, None]
                                             # next^
    all(x is None for x in seq)            
                                             #        next^
    any(x is None for x in seq)            
                                             #                    next^ (off the end)
    return all(x is None for x in seq)       # all returns True for the empty sequence
    

    example2:

    seq = iter([1, 2, None, 3, None, None])  #    [1, 2, None, 3, None, None]
                                             # next^
    all(x is None for x in seq)            
                                             #    next^
    any(x is None for x in seq)            
                                             #             next^  
    return all(x is None for x in seq)       # all returns False when 3 is encountered
    
    0 讨论(0)
  • 2021-02-01 02:15

    The natural way to consume sequence elements is to use dropwhile:

    from itertools import dropwhile
    def continuous(seq):
        return all(x is None for x in dropwhile(lambda x: x is not None,
                                                dropwhile(lambda x: x is None, seq)))
    

    We can express this without nested function calls:

    from itertools import dropwhile
    def continuous(seq):
        core = dropwhile(lambda x: x is None, seq)
        remainder = dropwhile(lambda x: x is not None, core)
        return all(x is None for x in remainder)
    
    0 讨论(0)
  • 2021-02-01 02:16

    Good 'ol itertools.groupby to the rescue:

    from itertools import groupby
    
    def contiguous(seq):
        return sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) == 1
    

    gives

    >>> contiguous([1,2,3,None,None])
    True
    >>> contiguous([None, 1,2,3,None])
    True
    >>> contiguous([None, None, 1,2,3])
    True
    >>> contiguous([None, 1, None, 2,3])
    False
    >>> contiguous([None, None, 1, None, 2,3])
    False
    >>> contiguous([None, 1, None, 2, None, 3])
    False
    >>> contiguous([1, 2, None, 3, None, None])
    False
    

    [edit]

    Since there seems to be some discussion in the comments, I'll explain why I like this approach better than some of the others.

    We're trying to find out whether there is one contiguous group of non-None objects, and

    sum(1 for k,g in groupby(seq, lambda x: x is not None) if k)
    

    counts the number of contiguous non-None objects, using the function in the stdlib which is designed for making collecting contiguous groups. As soon as we see groupby, we think "contiguous groups", and vice-versa. In that sense, it's self-documenting. This is basically the definition of my goal.

    IMHO the only weakness is that it doesn't short-circuit, and that could be fixed, but after thinking about it some I still prefer this as it uses a primitive I like -- "count the number of contiguous non-None groups" -- which I prefer to simply "tell me whether or not there is more than one contiguous non-None group as soon as you can".

    Many of the approaches to implement the last one rely on clever observations about the problem, like "if there's only one contiguous group of not-None objects, then if we scan until we find the first not-None object, and then scan through objects until we find the first non-None group if one exists, then whether anything's left is None gives us our answer." (Or something like that, which is part of my issue: I have to think about it.) To me that feels like using "implementation details" about the problem to solve it, and focuses on properties of the problem we can use to solve it, rather than simply specifying the problem to Python and letting Python do the work.

    I'm a bear of very little brain, as the saying has it, and I like to avoid having to be clever, as in my experience it's a route littered with FAIL.

    As always, everyone's mileage may vary, of course, and probably in proportion to their cleverness.

    0 讨论(0)
提交回复
热议问题