How to remove every occurrence of sub-list from list

前端 未结 13 563
走了就别回头了
走了就别回头了 2021-01-07 16:16

I have two lists:

big_list = [2, 1, 2, 3, 1, 2, 4]
sub_list = [1, 2]

I want to remove all sub_list occurrences in big_list.

result

相关标签:
13条回答
  • 2021-01-07 16:27

    How about this:

    def remove_sublist(lst, sub):
        max_ind_sub = len(sub) - 1
        out = []
        i = 0
        tmp = []
    
        for x in lst:
            if x == sub[i]:
                tmp.append(x)
                if i < max_ind_sub: # partial match 
                    i += 1
                else:  # found complete match
                    i = 0
                    tmp = []
            else:
                if tmp:  # failed partial match 
                    i = 0
                    out += tmp
                if x == sub[0]:  # partial match
                    i += 1
                    tmp = [x]
                else:
                    out.append(x)
    
        return out
    

    Performance:

    lst = [2, 1, 2, 3, 1, 2, 4]
    sub = [1, 2]
    %timeit remove_sublist(lst, sub)  # solution of Mad Physicist
    %timeit remove_sublist_new(lst, sub)
    >>> 2.63 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    >>> 1.77 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    

    Update

    My first solution had a bug. Was able to fix it (updated my code above) but the method looks way more complicated now. In terms of performance it still does better than the solution from Mad Physicist on my local machine.

    0 讨论(0)
  • 2021-01-07 16:27

    More readable than any above and with no additional memory footprint:

    def remove_sublist(sublist, mainlist):
    
        cursor = 0
    
        for b in mainlist:
            if cursor == len(sublist):
                cursor = 0
            if b == sublist[cursor]:
                cursor += 1
            else:
                cursor = 0
                yield b
    
        for i in range(0, cursor):
            yield sublist[i]
    

    This is for onliner if you wanted a function from library, let it be this

    [x for x in remove_sublist([1, 2], [2, 1, 2, 3, 1, 2, 4])]
    
    0 讨论(0)
  • 2021-01-07 16:31

    Kinda different approach in Python 2.x!

    from more_itertools import locate, windowed
    big_list = [1, 2, 1, 2, 1]
    sub_list = [1, 2, 1]
    
    """
    Fetching all starting point of indexes (of sub_list in big_list)
    to be removed from big_list. 
    """
    
    i = list(locate(windowed(big_list, len(sub_list)), pred=lambda x: x==tuple(sub_list)))
    
    """ 
    Here i comes out to be [0, 2] in above case. But index from 2 which 
    includes 1, 2, 1 has last 1 from the 1st half of 1, 2, 1 so further code is
    to handle this case.
    PS: this won't come for-
    big_list = [2, 1, 2, 3, 1, 2, 4]
    sub_list = [1, 2]
    as here i comes out to be [1, 4]
    """
    
    # The further code.
    to_pop = []
    for ele in i:
        if to_pop:
            if ele == to_pop[-1]:
                continue
        to_pop.extend(range(ele, ele+len(sub_list)))
    
    # Voila! to_pop consists of all the indexes to be removed from big_list.
    
    # Wiping out the elements!
    for index in sorted(to_pop, reverse=True):
        del big_list[index]
    

    Note that you need to delete them in reverse order so that you don't throw off the subsequent indexes.

    In Python3, signature of locate() will differ.

    0 讨论(0)
  • 2021-01-07 16:31

    What you are trying to achieve can be done by converting it into list of strings and after replacing again convert it to integer type.

    In a single line you can do it like this

    map(int,list(("".join(map(str, big_list))).replace("".join(map(str, sub_list)),'').replace(''.join((map(str, sub_list))[::-1]),'')))
    

    Input

    big_list = [1, 2, 1, 2, 1]
    sub_list = [1, 2, 1]
    

    Output

    [2, 1]

    Input

    big_list = [2, 1, 2, 3, 1, 2, 4]
    sub_list = [1, 2]
    

    Ouput

    [2, 3, 4]

    0 讨论(0)
  • 2021-01-07 16:37

    Update: The more_itertools library has released more_itertool.replace, a tool that solves this particular problem (see Option 3).

    First, here are some other options that work on generic iterables (lists, strings, iterators, etc.):

    Code

    Option 1 - without libraries:

    def remove(iterable, subsequence):
        """Yield non-subsequence items; sans libraries."""
        seq = tuple(iterable)
        subsequence = tuple(subsequence)
        n = len(subsequence)
        skip = 0
    
        for i, x in enumerate(seq):
            slice_ = seq[i:i+n]
            if not skip and (slice_ == subsequence):
                skip = n
            if skip:
                skip -= 1
                continue
            yield x   
    

    Option 2 - with more_itertools

    import more_itertools as mit
    
    
    def remove(iterable, subsequence):
        """Yield non-subsequence items."""
        iterable = tuple(iterable)
        subsequence = tuple(subsequence)
        n = len(subsequence)
        indices = set(mit.locate(mit.windowed(iterable, n), pred=lambda x: x == subsequence))
    
        it_ = enumerate(iterable)
        for i, x in it_:
            if i in indices:
                mit.consume(it_, n-1)
            else:
                yield x
    

    Demo

    list(remove(big_list, sub_list))
    # [2, 3, 4]
    
    list(remove([1, 2, 1, 2], sub_list))
    # []
    
    list(remove([1, "a", int, 3, float, "a", int, 5], ["a", int]))
    # [1, 3, float, 5]
    
    list(remove("11111", "111"))
    # ['1', '1']
    
    list(remove(iter("11111"), iter("111")))
    # ['1', '1']
    

    Option 3 - with more_itertools.replace:

    Demo

    pred = lambda *args: args == tuple(sub_list)
    list(mit.replace(big_list, pred=pred, substitutes=[], window_size=2))
    # [2, 3, 4]
    
    pred=lambda *args: args == tuple(sub_list)
    list(mit.replace([1, 2, 1, 2], pred=pred, substitutes=[], window_size=2))
    # []
    
    pred=lambda *args: args == tuple(["a", int])
    list(mit.replace([1, "a", int, 3, float, "a", int, 5], pred=pred, substitutes=[], window_size=2))
    # [1, 3, float, 5]
    
    pred=lambda *args: args == tuple("111")
    list(mit.replace("11111", pred=pred, substitutes=[], window_size=3))
    # ['1', '1']
    
    pred=lambda *args: args == tuple(iter("111"))
    list(mit.replace(iter("11111"), pred=pred, substitutes=[], window_size=3))
    # ['1', '1']
    

    Details

    In all of these examples, we are scanning the main sequence with smaller window slices. We yield whatever is not found in the slice and skip whatever is in the slice.

    Option 1 - without libraries

    Iterate an enumerated sequence and evaluate slices of size n (the length of the sub-sequence). If the upcoming slice equals the sub-sequence, reset skip and yield the item. Otherwise, iterate past it. skip tracks how many times to advance the loop, e.g. sublist is of size n=2, so it skips twice per match.

    Note, you can convert this option to work with sequences alone by removing the first two tuple assignments and replacing the iterable parameter with seq, e.g. def remove(seq, subsequence):.

    Option 2 - with more_itertools

    Indices are located for every matching sub-sequence in an iterable. While iterating an enumerated iterator, if an index is found in indices, the remaining sub-sequence is skipped by consuming the next n-1 elements from the iterator. Otherwise, an item is yielded.

    Install this library via > pip install more_itertools.

    Option 3 - with more_itertools.replace:

    This tool replaces a sub-sequence of items defined in a predicate with substitute values. To remove items, we substitute an empty container, e.g. substitutes=[]. The length of replaced items is specified by the window_size parameter (this value is equal to the length of the sub-sequence).

    0 讨论(0)
  • 2021-01-07 16:39

    Try del and slicing. The worst time complexity is O(N^2).

    sub_list=['a', int]
    big_list=[1, 'a', int, 3, float, 'a', int, 5]
    i=0
    while i < len(big_list):
        if big_list[i:i+len(sub_list)]==sub_list:
            del big_list[i:i+len(sub_list)]
        else:
            i+=1
    
    print(big_list)
    

    result:

    [1, 3, <class 'float'>, 5]
    
    0 讨论(0)
提交回复
热议问题