Most Pythonic Way to Split an Array by Repeating Elements

前端 未结 11 1276
星月不相逢
星月不相逢 2021-02-13 09:51

I have a list of items that I want to split based on a delimiter. I want all delimiters to be removed and the list to be split when a delimiter occurs twice. F

相关标签:
11条回答
  • 2021-02-13 10:19

    Here's a way to do it with itertools.groupby():

    import itertools
    
    class MultiDelimiterKeyCallable(object):
        def __init__(self, delimiter, num_wanted=1):
            self.delimiter = delimiter
            self.num_wanted = num_wanted
    
            self.num_found = 0
    
        def __call__(self, value):
            if value == self.delimiter:
                self.num_found += 1
                if self.num_found >= self.num_wanted:
                    self.num_found = 0
                    return True
            else:
                self.num_found = 0
    
    def split_multi_delimiter(items, delimiter, num_wanted):
        keyfunc = MultiDelimiterKeyCallable(delimiter, num_wanted)
    
        return (list(item
                     for item in group
                     if item != delimiter)
                for key, group in itertools.groupby(items, keyfunc)
                if not key)
    
    items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
    
    print list(split_multi_delimiter(items, "X", 2))
    

    I must say that cobbal's solution is much simpler for the same results.

    0 讨论(0)
  • 2021-02-13 10:20

    Here's another way of doing this:

    def split_multi_delimiter(items, delimiter, num_wanted):
        def remove_delimiter(objs):
            return [obj for obj in objs if obj != delimiter]
    
        ranges = [(index, index+num_wanted) for index in xrange(len(items))
                  if items[index:index+num_wanted] == [delimiter] * num_wanted]
    
        last_end = 0
        for range_start, range_end in ranges:
            yield remove_delimiter(items[last_end:range_start])
            last_end = range_end
    
        yield remove_delimiter(items[last_end:])
    
    items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
    print list(split_multi_delimiter(items, "X", 2))
    
    0 讨论(0)
  • 2021-02-13 10:21

    Regex, I choose you!

    import re
    
    def split_multiple(delimiter, input):
        pattern = ''.join(map(lambda x: ',' if x == delimiter else ' ', input))
        filtered = filter(lambda x: x != delimiter, input)
        result = []
        for k in map(len, re.split(';', ''.join(re.split(',',
            ';'.join(re.split(',{2,}', pattern)))))):
            result.append([])
            for n in range(k):
                result[-1].append(filtered.__next__())
        return result
    
    print(split_multiple('X',
        ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']))
    

    Oh, you said Python, not Perl.

    0 讨论(0)
  • 2021-02-13 10:25

    Very ugly, but I wanted to see if I could pull this off as a one-liner and I thought I would share. I beg you not to actually use this solution for anything of any importance though. The ('X', 3) at the end is the delimiter and the number of times it should be repeated.

    (lambda delim, count: map(lambda x:filter(lambda y:y != delim, x), reduce(lambda x, y: (x[-1].append(y) if y != delim or x[-1][-count+1:] != [y]*(count-1) else x.append([])) or x, ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])))('X', 2)
    

    EDIT

    Here's a breakdown. I also eliminated some redundant code that was far more obvious when written out like this. (changed above also)

    # Wrap everything in a lambda form to avoid repeating values
    (lambda delim, count:
        # Filter all sublists after construction
        map(lambda x: filter(lambda y: y != delim, x), reduce(
            lambda x, y: (
                # Add the value to the current sub-list
                x[-1].append(y) if
                    # but only if we have accumulated the
                    # specified number of delimiters
                    y != delim or x[-1][-count+1:] != [y]*(count-1) else
    
                    # Start a new sublist
                    x.append([]) or x,
            ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])
        )
    )('X', 2)
    
    0 讨论(0)
  • 2021-02-13 10:26
    In [6]: input = ['a', 'b', 'X', 'X', 'cc', 'XX', 'd', 'X', 'ee', 'X', 'X', 'f']
    
    In [7]: [s.strip('_').split('_') for s in '_'.join(input).split('X_X')]
    Out[7]: [['a', 'b'], ['cc', 'XX', 'd', 'X', 'ee'], ['f']]
    

    This assumes you can use a reserved character such as _ which is not found in the input.

    0 讨论(0)
提交回复
热议问题