Most Pythonic Way to Split an Array by Repeating Elements

前端 未结 11 1275
星月不相逢
星月不相逢 2021-02-13 09:51

I have a list of items that I want to split based on a delimiter. I want all delimiters to be removed and the list to be split when a delimiter occurs twice. F

相关标签:
11条回答
  • 2021-02-13 10:04

    Here's a clean nice solution using zip and generators

    #1 define traditional sequence split function 
    #if you only want it for lists, you can use indexing to make it shorter
    def split(it, x):
        to_yield = []
        for y in it:
            if x == y:
                yield to_yield
                to_yield = []
            else:
                to_yield.append(y)
        if to_yield:
            yield to_yield
    
    #2 zip the sequence with its tail 
    #you could use itertools.chain to avoid creating unnecessary lists
    zipped = zip(l, l[1:] + [''])
    
    #3. remove ('X',not 'X')'s from the resulting sequence, and leave only the first position of each
    # you can use list comprehension instead of generator expression
    filtered = (x for x,y in zipped if not (x == 'X' and y != 'X'))
    
    #4. split the result using traditional split
    result = [x for x in split(filtered, 'X')]
    

    This way split() is more reusable.

    It's surprising python doesn't have one built in.

    edit:

    You can easily adjust it for longer split sequences, repeating steps 2-3 and zipping filtered with l[i:] for 0< i <= n.

    0 讨论(0)
  • 2021-02-13 10:09
    import re    
    map(list, re.sub('(?<=[a-z])X(?=[a-z])', '', ''.join(lst)).split('XX'))
    

    This does a list -> string -> list conversion and assumes that the non-delimiter characters are all lower case letters.

    0 讨论(0)
  • 2021-02-13 10:10

    I don't think there's going to be a nice, elegant solution to this (I'd love to be proven wrong of course) so I would suggest something straightforward:

    def nSplit(lst, delim, count=2):
        output = [[]]
        delimCount = 0
        for item in lst:
            if item == delim:
                delimCount += 1
            elif delimCount >= count:
                output.append([item])
                delimCount = 0
            else:
                output[-1].append(item)
                delimCount = 0
        return output
    

     

    >>> nSplit(['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], 'X', 2)
    [['a', 'b'], ['c', 'd'], ['f', 'g']]
    
    0 讨论(0)
  • 2021-02-13 10:10

    Use a generator function to maintain state of your iterator through the list, and the count of the number of separator chars seen so far:

    l = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'] 
    
    def splitOn(ll, x, n):
        cur = []
        splitcount = 0
        for c in ll:
            if c == x:
                splitcount += 1
                if splitcount == n:
                    yield cur
                    cur = []
                    splitcount = 0
            else:
                cur.append(c)
                splitcount = 0
        yield cur
    
    print list(splitOn(l, 'X', 2))
    print list(splitOn(l, 'X', 1))
    print list(splitOn(l, 'X', 3))
    
    l += ['X','X']
    print list(splitOn(l, 'X', 2))
    print list(splitOn(l, 'X', 1))
    print list(splitOn(l, 'X', 3))
    

    prints:

    [['a', 'b'], ['c', 'd'], ['f', 'g']]
    [['a', 'b'], [], ['c', 'd'], [], ['f'], ['g']]
    [['a', 'b', 'c', 'd', 'f', 'g']]
    [['a', 'b'], ['c', 'd'], ['f', 'g'], []]
    [['a', 'b'], [], ['c', 'd'], [], ['f'], ['g'], [], []]
    [['a', 'b', 'c', 'd', 'f', 'g']]
    

    EDIT: I'm also a big fan of groupby, here's my go at it:

    from itertools import groupby
    def splitOn(ll, x, n):
        cur = []
        for isdelim,grp in groupby(ll, key=lambda c:c==x):
            if isdelim:
                nn = sum(1 for c in grp)
                while nn >= n:
                    yield cur
                    cur = []
                    nn -= n
            else:
                cur.extend(grp)
        yield cur
    

    Not too different from my earlier answer, just lets groupby take care of iterating over the input list, creating groups of delimiter-matching and not-delimiter-matching characters. The non-matching characters just get added onto the current element, the matching character groups do the work of breaking up new elements. For long lists, this is probably a bit more efficient, as groupby does all its work in C, and still only iterates over the list once.

    0 讨论(0)
  • 2021-02-13 10:18
    a = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
    b = [[b for b in q if b != 'X'] for q in "".join(a).split("".join(['X' for i in range(2)]))]
    

    this gives

    [['a', 'b'], ['c', 'd'], ['f', 'g']]

    where the 2 is the number of elements you want. there is most likely a better way to do this.

    0 讨论(0)
  • 2021-02-13 10:19

    Too clever by half, and only offered because the obvious right way to do it seems so brute-force and ugly:

    class joiner(object):
      def __init__(self, N, data = (), gluing = False):
        self.data = data
        self.N = N
        self.gluing = gluing
      def __add__(self, to_glue):
        # Process an item from itertools.groupby, by either
        # appending the data to the last item, starting a new item,
        # or changing the 'gluing' state according to the number of
        # consecutive delimiters that were found.
        N = self.N
        data = self.data
        item = list(to_glue[1])
        # A chunk of delimiters;
        # return a copy of self with the appropriate gluing state.
        if to_glue[0]: return joiner(N, data, len(item) < N)
        # Otherwise, handle the gluing appropriately, and reset gluing state.
        a, b = (data[:-1], data[-1] if data else []) if self.gluing else (data, [])
        return joiner(N, a + (b + item,))
    
    def split_on_multiple(data, delimiter, N):
      # Split the list into alternating groups of delimiters and non-delimiters,
      # then use the joiner to join non-delimiter groups when the intervening
      # delimiter group is short.
      return sum(itertools.groupby(data, delimiter.__eq__), joiner(N)).data
    
    0 讨论(0)
提交回复
热议问题