Why have re.match()?

前端 未结 1 2018
温柔的废话
温柔的废话 2021-01-18 00:09

I know this topic has already been discussed multiple times here on StackOverflow, but I\'m looking for a better answer.

While I appreciate the differences, I was no

1条回答
  •  说谎
    说谎 (楼主)
    2021-01-18 00:41

    As you already know, re.match will test the pattern only at the start of the string and re.search will test all the string until it find a match.

    So, is there a difference between re.match('toto', s) and re.search('^toto', s) and what it is?

    Lets make a little test:

    #!/usr/bin/python
    
    import time
    import re
    
    p1 = re.compile(r'toto')
    p2 = re.compile(r'^toto')
    
    ssize = 1000
    
    s1 = 'toto abcdefghijklmnopqrstuvwxyz012356789'*ssize
    s2 = 'titi abcdefghijklmnopqrstuvwxyz012356789'*ssize
    
    nb = 1000
    
    i = 0
    t0 = time.time()
    while i < nb:
        p1.match(s1)
        i += 1
    t1 = time.time()
    
    i = 0
    t2 = time.time()
    while i < nb:
        p2.search(s1)
        i += 1
    t3 = time.time()
    
    print "\nsucceed\nmatch:"
    print (t1-t0)
    print "search:"
    print (t3-t2)
    
    
    i = 0
    t0 = time.time()
    while i < nb:
        p1.match(s2)
        i += 1
    t1 = time.time()
    
    i = 0
    t2 = time.time()
    while i < nb:
        p2.search(s2)
        i += 1
    t3 = time.time()
    
    print "\nfail\nmatch:"
    print (t1-t0)
    print "search:"
    print (t3-t2)
    

    The two ways are tested with a string that doesn't match and a string that matches.

    results:

    succeed
    match:
    0.000469207763672
    search:
    0.000494003295898
    
    fail
    match:
    0.000430107116699
    search:
    0.46605682373
    

    What can we conclude with these results:

    1) The performances are similar when the pattern succeeds

    2) The performances are totally different when the pattern fails. This is the most important point because, it means that re.search continues to test each positions of the string even if the pattern is anchored when re.match stops immediatly.

    If you increase the size of the failing test string, you will see that re.match doesn't take more time but re.search depends of the string size.

    0 讨论(0)
提交回复
热议问题