Removing an element from a list based on a predicate

前端 未结 9 1282
伪装坚强ぢ
伪装坚强ぢ 2021-01-18 12:36

I want to remove an element from list, such that the element contains \'X\' or \'N\'. I have to apply for a large genome. Here is an example:

9条回答
  •  无人及你
    2021-01-18 13:01

    If you're dealing with extremely large lists, you want to use methods that don't involve traversing the entire list any more than you absolutely need to.

    Your best bet is likely to be creating a filter function, and using itertools.ifilter, e.g.:

    new_seq = itertools.ifilter(lambda x: 'X' in x or 'N' in x, seq)
    

    This defers actually testing every element in the list until you actually iterate over it. Note that you can filter a filtered sequence just as you can the original sequence:

    new_seq1 = itertools.ifilter(some_other_predicate, new_seq)
    

    Edit:

    Also, a little testing shows that memoizing found entries in a set is likely to provide enough of an improvement to be worth doing, and using a regular expression is probably not the way to go:

    seq = ['AAT','XAC','ANT','TTA']
    >>> p = re.compile('[X|N]')
    >>> timeit.timeit('[x for x in seq if not p.search(x)]', 'from __main__ import p, seq')
    3.4722548536196314
    >>> timeit.timeit('[x for x in seq if "X" not in x and "N" not in x]', 'from __main__ import seq')
    1.0560532134670666
    >>> s = set(('XAC', 'ANT'))
    >>> timeit.timeit('[x for x in seq if x not in s]', 'from __main__ import s, seq')
    0.87923730529996647
    

提交回复
热议问题