How would I limit match/replacement the leading zeros in e004_n07? However, if either term contains all zeros, then I need to retain one zero in the term (see example below
There's no need to use re.sub
if your replacement is so simple - simply use str.replace
:
s = 'e004_n07'
s.replace('0', '') # => 'e4_n7'
If your requirement is that you MUST use regex
, then below is your regex pattern:
>>> import re
>>> s = 'e004_n07'
>>> line = re.sub(r"0", "", s)
>>> line
'e4_n7'
However it is recommended not to use regex when there is other efficient way to perform the same opertaion, i.e. using replace
function
>>> line = s.replace('0', '')
>>> line
'e4_n7'
edit: Don't let anybody talk you out of validating the format of the fixed data. If that's what you need, don't settle for something overly simple .
Not very pretty, but in a situation that seems fixed, you can just
set all the permutations, then blindly capture the good parts,
leave out the zero's then substitute it all back.
Find ([a-z])(?:([1-9][0-9][0-9])|0([1-9][0-9])|00([1-9]))(_[a-z])(?:([1-9][0-9])|0([1-9]))
Replace $1$2$3$4$5$6$7
Expanded
( [a-z] ) # (1)
(?:
( [1-9] [0-9] [0-9] ) # (2)
|
0
( [1-9] [0-9] ) # (3)
|
00
( [1-9] ) # (4)
)
( _ [a-z] ) # (5)
(?:
( [1-9] [0-9] ) # (6)
|
0
( [1-9] ) # (7)
)
Output
** Grp 0 - ( pos 0 , len 8 )
e004_n07
** Grp 1 - ( pos 0 , len 1 )
e
** Grp 2 - NULL
** Grp 3 - NULL
** Grp 4 - ( pos 3 , len 1 )
4
** Grp 5 - ( pos 4 , len 2 )
_n
** Grp 6 - NULL
** Grp 7 - ( pos 7 , len 1 )
7
If you want to only remove zeros after letters, you may use:
([a-zA-Z])0+
Replace with \1
backreference. See the regex demo.
The ([a-zA-Z])
will capture a letter and 0+
will match 1 or more zeros.
Python demo:
import re
s = 'e004_n07'
res = re.sub(r'([a-zA-Z])0+', r'\1', s)
print(res)
Note that re.sub will find and replace all non-overlapping matches (will perform a global search and replace). If there is no match, the string will be returned as is, without modifications. So, there is no need using additional re.match
/re.search
.
UDPATE
To keep 1 zero if the numbers only contain zeros, you may use
import re
s = ['e004_n07','e000_n00']
res = [re.sub(r'(?<=[a-zA-Z])0+(\d*)', lambda m: m.group(1) if m.group(1) else '0', x) for x in s]
print(res)
See the Python demo
Here, r'(?<=[a-zA-Z])0+(\d*)'
regex matches one or more zeros (0+
) that are after an ASCII letter ((?<=[a-zA-Z])
) and then any other digits (0 or more) are captured into Group 1 with (\d*)
. Then, in the replacement, we check if Group 1 is empty, and if it is empty, we insert 0
(there are only zeros), else, we insert Group 1 contents (the remaining digits after the first leading zeros).