Python Regular Expression; replacing a portion of match

前端 未结 4 1241
时光说笑
时光说笑 2020-12-22 00:04

How would I limit match/replacement the leading zeros in e004_n07? However, if either term contains all zeros, then I need to retain one zero in the term (see example below

相关标签:
4条回答
  • 2020-12-22 00:40

    There's no need to use re.sub if your replacement is so simple - simply use str.replace:

    s = 'e004_n07'
    s.replace('0', '') # => 'e4_n7'
    
    0 讨论(0)
  • 2020-12-22 01:02

    If your requirement is that you MUST use regex, then below is your regex pattern:

    >>> import re
    >>> s = 'e004_n07'
    >>> line = re.sub(r"0", "", s)
    >>> line
    'e4_n7'
    

    However it is recommended not to use regex when there is other efficient way to perform the same opertaion, i.e. using replace function

    >>> line = s.replace('0', '')
    >>> line
    'e4_n7'
    
    0 讨论(0)
  • 2020-12-22 01:06

    edit: Don't let anybody talk you out of validating the format of the fixed data. If that's what you need, don't settle for something overly simple .

    Not very pretty, but in a situation that seems fixed, you can just
    set all the permutations, then blindly capture the good parts,
    leave out the zero's then substitute it all back.

    Find ([a-z])(?:([1-9][0-9][0-9])|0([1-9][0-9])|00([1-9]))(_[a-z])(?:([1-9][0-9])|0([1-9]))

    Replace $1$2$3$4$5$6$7

    Expanded

     ( [a-z] )                     # (1)
     (?:
          ( [1-9] [0-9] [0-9] )         # (2)
       |  
          0
          ( [1-9] [0-9] )               # (3)
       |  
          00
          ( [1-9] )                     # (4)
     )
     ( _ [a-z] )                   # (5)
     (?:
          ( [1-9] [0-9] )               # (6)
       |  
          0
          ( [1-9] )                     # (7)
     )
    

    Output

     **  Grp 0 -  ( pos 0 , len 8 ) 
    e004_n07  
     **  Grp 1 -  ( pos 0 , len 1 ) 
    e  
     **  Grp 2 -  NULL 
     **  Grp 3 -  NULL 
     **  Grp 4 -  ( pos 3 , len 1 ) 
    4  
     **  Grp 5 -  ( pos 4 , len 2 ) 
    _n  
     **  Grp 6 -  NULL 
     **  Grp 7 -  ( pos 7 , len 1 ) 
    7  
    
    0 讨论(0)
  • 2020-12-22 01:07

    If you want to only remove zeros after letters, you may use:

    ([a-zA-Z])0+
    

    Replace with \1 backreference. See the regex demo.

    The ([a-zA-Z]) will capture a letter and 0+ will match 1 or more zeros.

    Python demo:

    import re
    s = 'e004_n07'
    res = re.sub(r'([a-zA-Z])0+', r'\1', s)
    print(res)
    

    Note that re.sub will find and replace all non-overlapping matches (will perform a global search and replace). If there is no match, the string will be returned as is, without modifications. So, there is no need using additional re.match/re.search.

    UDPATE

    To keep 1 zero if the numbers only contain zeros, you may use

    import re
    s = ['e004_n07','e000_n00']
    res = [re.sub(r'(?<=[a-zA-Z])0+(\d*)', lambda m: m.group(1) if m.group(1) else '0', x) for x in s]
    print(res)
    

    See the Python demo

    Here, r'(?<=[a-zA-Z])0+(\d*)' regex matches one or more zeros (0+) that are after an ASCII letter ((?<=[a-zA-Z])) and then any other digits (0 or more) are captured into Group 1 with (\d*). Then, in the replacement, we check if Group 1 is empty, and if it is empty, we insert 0 (there are only zeros), else, we insert Group 1 contents (the remaining digits after the first leading zeros).

    0 讨论(0)
提交回复
热议问题