Most Pythonic Way to Split an Array by Repeating Elements

前端未结

关注

 11  1322

I have a list of items that I want to split based on a delimiter. I want all delimiters to be removed and the list to be split when a delimiter occurs twice. F

相关标签:

11条回答

渐次进展

2021-02-13 10:04

Here's a clean nice solution using zip and generators

#1 define traditional sequence split function 
#if you only want it for lists, you can use indexing to make it shorter
def split(it, x):
    to_yield = []
    for y in it:
        if x == y:
            yield to_yield
            to_yield = []
        else:
            to_yield.append(y)
    if to_yield:
        yield to_yield

#2 zip the sequence with its tail 
#you could use itertools.chain to avoid creating unnecessary lists
zipped = zip(l, l[1:] + [''])

#3. remove ('X',not 'X')'s from the resulting sequence, and leave only the first position of each
# you can use list comprehension instead of generator expression
filtered = (x for x,y in zipped if not (x == 'X' and y != 'X'))

#4. split the result using traditional split
result = [x for x in split(filtered, 'X')]

This way split() is more reusable.

It's surprising python doesn't have one built in.

edit:

You can easily adjust it for longer split sequences, repeating steps 2-3 and zipping filtered with l[i:] for 0< i <= n.

0 讨论(0)

别那么骄傲

2021-02-13 10:09
```
import re    
map(list, re.sub('(?<=[a-z])X(?=[a-z])', '', ''.join(lst)).split('XX'))
```
This does a list -> string -> list conversion and assumes that the non-delimiter characters are all lower case letters.
0 讨论(0)
发布评论:

提交评论
- 加载中...

迷失自我

2021-02-13 10:10

I don't think there's going to be a nice, elegant solution to this (I'd love to be proven wrong of course) so I would suggest something straightforward:

def nSplit(lst, delim, count=2):
    output = [[]]
    delimCount = 0
    for item in lst:
        if item == delim:
            delimCount += 1
        elif delimCount >= count:
            output.append([item])
            delimCount = 0
        else:
            output[-1].append(item)
            delimCount = 0
    return output

>>> nSplit(['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], 'X', 2)
[['a', 'b'], ['c', 'd'], ['f', 'g']]

0 讨论(0)

刺人心

2021-02-13 10:10

Use a generator function to maintain state of your iterator through the list, and the count of the number of separator chars seen so far:

l = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'] 

def splitOn(ll, x, n):
    cur = []
    splitcount = 0
    for c in ll:
        if c == x:
            splitcount += 1
            if splitcount == n:
                yield cur
                cur = []
                splitcount = 0
        else:
            cur.append(c)
            splitcount = 0
    yield cur

print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))

l += ['X','X']
print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))

prints:

[['a', 'b'], ['c', 'd'], ['f', 'g']]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g']]
[['a', 'b', 'c', 'd', 'f', 'g']]
[['a', 'b'], ['c', 'd'], ['f', 'g'], []]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g'], [], []]
[['a', 'b', 'c', 'd', 'f', 'g']]

EDIT: I'm also a big fan of groupby, here's my go at it:

from itertools import groupby
def splitOn(ll, x, n):
    cur = []
    for isdelim,grp in groupby(ll, key=lambda c:c==x):
        if isdelim:
            nn = sum(1 for c in grp)
            while nn >= n:
                yield cur
                cur = []
                nn -= n
        else:
            cur.extend(grp)
    yield cur

Not too different from my earlier answer, just lets groupby take care of iterating over the input list, creating groups of delimiter-matching and not-delimiter-matching characters. The non-matching characters just get added onto the current element, the matching character groups do the work of breaking up new elements. For long lists, this is probably a bit more efficient, as groupby does all its work in C, and still only iterates over the list once.

0 讨论(0)

广开言路

2021-02-13 10:18
```
a = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
b = [[b for b in q if b != 'X'] for q in "".join(a).split("".join(['X' for i in range(2)]))]
```
this gives

[['a', 'b'], ['c', 'd'], ['f', 'g']]

where the 2 is the number of elements you want. there is most likely a better way to do this.
0 讨论(0)
发布评论:

提交评论
- 加载中...

名媛妹妹

2021-02-13 10:19

Too clever by half, and only offered because the obvious right way to do it seems so brute-force and ugly:

class joiner(object):
  def __init__(self, N, data = (), gluing = False):
    self.data = data
    self.N = N
    self.gluing = gluing
  def __add__(self, to_glue):
    # Process an item from itertools.groupby, by either
    # appending the data to the last item, starting a new item,
    # or changing the 'gluing' state according to the number of
    # consecutive delimiters that were found.
    N = self.N
    data = self.data
    item = list(to_glue[1])
    # A chunk of delimiters;
    # return a copy of self with the appropriate gluing state.
    if to_glue[0]: return joiner(N, data, len(item) < N)
    # Otherwise, handle the gluing appropriately, and reset gluing state.
    a, b = (data[:-1], data[-1] if data else []) if self.gluing else (data, [])
    return joiner(N, a + (b + item,))

def split_on_multiple(data, delimiter, N):
  # Split the list into alternating groups of delimiters and non-delimiters,
  # then use the joiner to join non-delimiter groups when the intervening
  # delimiter group is short.
  return sum(itertools.groupby(data, delimiter.__eq__), joiner(N)).data

0 讨论(0)

1 2 下一页