问题
I am using re.sub in order to forcibly convert a "bad" string into a "valid" string via regex. I am struggling with creating the right regex that will parse a string and "remove the bad parts". Specifically, I would like to force a string to be all alphabetical, and allow for a single space between words. Any values that disagree with this rule I would like to substitute with ''. This includes multiple spaces. Any help would be appreciated!
import re
list_of_strings = ["3He2l2lo Wo45rld!", "Hello World- -number two-", "Hello World number .. three"
for str in list_of_strings:
print re.sub(r'[^A-Za-z]+([^\s][A-Za-z])*', '' , str)
I would like the output to be:
Hello World
Hello World number two
Hello World number three
回答1:
Try if the following works. It matches both groups of characters to remove, but only when there is at least an space in them subsitutes it with an space.
import re
list_of_strings = ["3He2l2lo Wo45rld!", "Hello World- -number two-", "Hello World number .. three"]
for str in list_of_strings:
print(re.sub(r'((?:[^A-Za-z\s]|\s)+)', lambda x: ' ' if ' ' in x.group(0) else '' , str))
It yields:
Hello World
Hello World number two
Hello World number three
回答2:
I would prefer to have 2 passes to simplify the regex. First pass removes non-alphas, second removes multiple spaces.
pass1 = re.sub(r'[^A-Za-z\s]','',str) # remove non-alpha
pass2 = re.sub(r'\s+',' ',pass1); # collapses spaces to 1
来源:https://stackoverflow.com/questions/18752018/python-regex-for-words-single-space