Using a regular expression to replace upper case repeated letters in python with a single lowercase letter

穿精又带淫゛_ 提交于 2019-11-26 20:48:45

问题


I am trying to replace any instances of uppercase letters that repeat themselves twice in a string with a single instance of that letter in a lower case. I am using the following regular expression and it is able to match the repeated upper case letters, but I am unsure as how to make the letter that is being replaced lower case.

import re
s = 'start TT end'
re.sub(r'([A-Z]){2}', r"\1", s)
>>> 'start T end'

How can I make the "\1" lower case? Should I not be using a regular expression to do this?


回答1:


Pass a function as the repl argument. The MatchObject is passed to this function and .group(1) gives the first parenthesized subgroup:

import re
s = 'start TT end'
callback = lambda pat: pat.group(1).lower()
re.sub(r'([A-Z]){2}', callback, s)

EDIT
And yes, you should use ([A-Z])\1 instead of ([A-Z]){2} in order to not match e.g. AZ. (See @bobince's answer.)

import re
s = 'start TT end'
re.sub(r'([A-Z])\1', lambda pat: pat.group(1).lower(), s) # Inline

Gives:

'start t end'



回答2:


You can't change case in a replacement string. You would need a replacement function:

>>> def replacement(match):
...     return match.group(1).lower()
... 
>>> re.sub(r'([A-Z])\1', replacement, 'start TT end')
'start t end'



回答3:


You can do it with a regular expression, just pass a function as the replacement like the docs say. The problem is your pattern.

As it is, your pattern matches runs of any two capital letters. I'll leave the actual pattern to you, but it starts with AA|BB|CC|.




回答4:


The 'repl' parameter that identifies the replacement can be either a string (as you have it here) or a function. This will do what you wish:

import re

def toLowercase(matchobj):
   return matchobj.group(1).lower()

s = 'start TT end'
re.sub(r'([A-Z]){2}', toLowercase, s)
>>> 'start t end'



回答5:


Try this:

def tol(m):
   return m.group(0)[0].lower()

s = 'start TTT AAA end'
re.sub(r'([A-Z]){2,}', tol, s)

Note that this doesn't replace singe upper letters. If you want to do it, use r'([A-Z]){1,}'.




回答6:


WARNING! This post has no re as requested. Continue with your own responsibility!

I do not know how possible are corner cases but this is how normal Python does my naive coding.

import string
s = 'start TT end AAA BBBBBBB'
for c in string.uppercase:
    s = s.replace(c+c,c.lower())
print s
""" Output:
start t end aA bbbB
"""


来源:https://stackoverflow.com/questions/4145451/using-a-regular-expression-to-replace-upper-case-repeated-letters-in-python-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!