Most Pythonic Way to Split an Array by Repeating Elements

前端未结

关注

 11  1276

I have a list of items that I want to split based on a delimiter. I want all delimiters to be removed and the list to be split when a delimiter occurs twice. F

相关标签:

11条回答

孤独总比滥情好

2021-02-13 10:19

Here's a way to do it with itertools.groupby():

import itertools

class MultiDelimiterKeyCallable(object):
    def __init__(self, delimiter, num_wanted=1):
        self.delimiter = delimiter
        self.num_wanted = num_wanted

        self.num_found = 0

    def __call__(self, value):
        if value == self.delimiter:
            self.num_found += 1
            if self.num_found >= self.num_wanted:
                self.num_found = 0
                return True
        else:
            self.num_found = 0

def split_multi_delimiter(items, delimiter, num_wanted):
    keyfunc = MultiDelimiterKeyCallable(delimiter, num_wanted)

    return (list(item
                 for item in group
                 if item != delimiter)
            for key, group in itertools.groupby(items, keyfunc)
            if not key)

items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']

print list(split_multi_delimiter(items, "X", 2))

I must say that cobbal's solution is much simpler for the same results.

0 讨论(0)

暗喜

2021-02-13 10:20

Here's another way of doing this:

def split_multi_delimiter(items, delimiter, num_wanted):
    def remove_delimiter(objs):
        return [obj for obj in objs if obj != delimiter]

    ranges = [(index, index+num_wanted) for index in xrange(len(items))
              if items[index:index+num_wanted] == [delimiter] * num_wanted]

    last_end = 0
    for range_start, range_end in ranges:
        yield remove_delimiter(items[last_end:range_start])
        last_end = range_end

    yield remove_delimiter(items[last_end:])

items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
print list(split_multi_delimiter(items, "X", 2))

0 讨论(0)

灰色年华

2021-02-13 10:21

Regex, I choose you!

import re

def split_multiple(delimiter, input):
    pattern = ''.join(map(lambda x: ',' if x == delimiter else ' ', input))
    filtered = filter(lambda x: x != delimiter, input)
    result = []
    for k in map(len, re.split(';', ''.join(re.split(',',
        ';'.join(re.split(',{2,}', pattern)))))):
        result.append([])
        for n in range(k):
            result[-1].append(filtered.__next__())
    return result

print(split_multiple('X',
    ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']))

Oh, you said Python, not Perl.

0 讨论(0)

后悔当初

2021-02-13 10:25

Very ugly, but I wanted to see if I could pull this off as a one-liner and I thought I would share. I beg you not to actually use this solution for anything of any importance though. The ('X', 3) at the end is the delimiter and the number of times it should be repeated.

(lambda delim, count: map(lambda x:filter(lambda y:y != delim, x), reduce(lambda x, y: (x[-1].append(y) if y != delim or x[-1][-count+1:] != [y]*(count-1) else x.append([])) or x, ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])))('X', 2)

EDIT

Here's a breakdown. I also eliminated some redundant code that was far more obvious when written out like this. (changed above also)

# Wrap everything in a lambda form to avoid repeating values
(lambda delim, count:
    # Filter all sublists after construction
    map(lambda x: filter(lambda y: y != delim, x), reduce(
        lambda x, y: (
            # Add the value to the current sub-list
            x[-1].append(y) if
                # but only if we have accumulated the
                # specified number of delimiters
                y != delim or x[-1][-count+1:] != [y]*(count-1) else

                # Start a new sublist
                x.append([]) or x,
        ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])
    )
)('X', 2)

0 讨论(0)

忘了有多久

2021-02-13 10:26

In [6]: input = ['a', 'b', 'X', 'X', 'cc', 'XX', 'd', 'X', 'ee', 'X', 'X', 'f']

In [7]: [s.strip('_').split('_') for s in '_'.join(input).split('X_X')]
Out[7]: [['a', 'b'], ['cc', 'XX', 'd', 'X', 'ee'], ['f']]

This assumes you can use a reserved character such as _ which is not found in the input.

0 讨论(0)

上一页 1 2