Is str.replace(..).replace(..) ad nauseam a standard idiom in Python?

前端 未结 9 1369
迷失自我
迷失自我 2020-12-20 11:19

For instance, say I wanted a function to escape a string for use in HTML (as in Django\'s escape filter):

    def escape(string):
        \"\"\"
        Retu         


        
9条回答
  •  有刺的猬
    2020-12-20 11:23

    In accordance with bebraw's suggestion, here is what I ended up using (in a separate module, of course):

    import re
    
    class Subs(object):
        """
        A container holding strings to be searched for and replaced in
        replace_multi().
    
        Holds little relation to the sandwich.
        """
        def __init__(self, needles_and_replacements):
            """
            Returns a new instance of the Subs class, given a dictionary holding 
            the keys to be searched for and the values to be used as replacements.
            """
            self.lookup = needles_and_replacements
            self.regex = re.compile('|'.join(map(re.escape,
                                                 needles_and_replacements)))
    
    def replace_multi(string, subs):
        """
        Replaces given items in string efficiently in a single-pass.
    
        "string" should be the string to be searched.
        "subs" can be either:
            A.) a dictionary containing as its keys the items to be
                searched for and as its values the items to be replaced.
            or B.) a pre-compiled instance of the Subs class from this module
                   (which may have slightly better performance if this is
                    called often).
        """
        if not isinstance(subs, Subs): # Assume dictionary if not our class.
            subs = Subs(subs)
        lookup = subs.lookup
        return subs.regex.sub(lambda match: lookup[match.group(0)], string)
    

    Example usage:

    def escape(string):
        """
        Returns the given string with ampersands, quotes and angle 
        brackets encoded.
        """
        # Note that ampersands must be escaped first; the rest can be escaped in 
        # any order.
        escape.subs = Subs({'<': '<', '>': '>', "'": ''', '"': '"'})
        return replace_multi(string.replace('&', '&'), escape.subs)
    

    Much better :). Thanks for the help.

    Edit

    Nevermind, Mike Graham was right. I benchmarked it and the replacement ends up actually being much slower.

    Code:

    from urllib2 import urlopen
    import timeit
    
    def escape1(string):
        """
        Returns the given string with ampersands, quotes and angle
        brackets encoded.
        """
        return string.replace('&', '&').replace('<', '<').replace('>', '>').replace("'", ''').replace('"', '"')
    
    def escape2(string):
        """
        Returns the given string with ampersands, quotes and angle
        brackets encoded.
        """
        # Note that ampersands must be escaped first; the rest can be escaped in
        # any order.
        escape2.subs = Subs({'<': '<', '>': '>', "'": ''', '"': '"'})
        return replace_multi(string.replace('&', '&'), escape2.subs)
    
    # An example test on the stackoverflow homepage.
    request = urlopen('http://stackoverflow.com')
    test_string = request.read()
    request.close()
    
    test1 = timeit.Timer('escape1(test_string)',
                         setup='from __main__ import escape1, test_string')
    test2 = timeit.Timer('escape2(test_string)',
                         setup='from __main__ import escape2, test_string')
    print 'multi-pass:', test1.timeit(2000)
    print 'single-pass:', test2.timeit(2000)
    

    Output:

    multi-pass: 15.9897229671
    single-pass: 66.5422530174
    

    So much for that.

提交回复
热议问题