Removing an element from a list based on a predicate

前端 未结 9 1288
伪装坚强ぢ
伪装坚强ぢ 2021-01-18 12:36

I want to remove an element from list, such that the element contains \'X\' or \'N\'. I have to apply for a large genome. Here is an example:

9条回答
  •  感情败类
    2021-01-18 13:14

    It is (asympotically) faster to use a regular expression than searching many times in the same string for a certain character: in fact, with a regular expression the sequences is only be read at most once (instead of twice when the letters are not found, in gnibbler's original answer, for instance). With gnibbler's memoization, the regular expression approach reads:

    import re
    remove = re.compile('[XN]').search
    
    codon = ['AAT','XAC','ANT','TTA']
    def pred(s,memo={}):
        if s not in memo:
            memo[s]= not remove(s)
        return memo[s]
    
    print filter(pred,codon)
    

    This should be (asymptotically) faster than using the "in s" or the "set" checks (i.e., the code above should be faster for long enough strings s).

    I originally thought that gnibbler's answer could be written in a faster and more compact way with dict.setdefault():

    codon = ['AAT','XAC','ANT','TTA']
    def pred(s,memo={}):
        return memo.setdefault(s, not any(y in s for y in "XN"))
    
    print filter(pred,codon)
    

    However, as gnibbler noted, the value in setdefault is always evaluated (even though, in principle, it could be evaluated only when the dictionary key is not found).

提交回复
热议问题