Python Regex for Words & single space

问题

I am using re.sub in order to forcibly convert a "bad" string into a "valid" string via regex. I am struggling with creating the right regex that will parse a string and "remove the bad parts". Specifically, I would like to force a string to be all alphabetical, and allow for a single space between words. Any values that disagree with this rule I would like to substitute with ''. This includes multiple spaces. Any help would be appreciated!

import re
list_of_strings = ["3He2l2lo Wo45rld!", "Hello World- -number two-", "Hello    World    number .. three"
for str in list_of_strings:
    print re.sub(r'[^A-Za-z]+([^\s][A-Za-z])*', '' , str)

I would like the output to be:

Hello World

Hello World number two

Hello World number three

回答1:

Try if the following works. It matches both groups of characters to remove, but only when there is at least an space in them subsitutes it with an space.

import re
list_of_strings = ["3He2l2lo Wo45rld!", "Hello World- -number two-", "Hello    World    number .. three"]
for str in list_of_strings:
    print(re.sub(r'((?:[^A-Za-z\s]|\s)+)', lambda x: ' ' if ' ' in x.group(0) else '' , str))

It yields:

Hello World
Hello World number two
Hello World number three

回答2:

I would prefer to have 2 passes to simplify the regex. First pass removes non-alphas, second removes multiple spaces.

pass1 = re.sub(r'[^A-Za-z\s]','',str)    # remove non-alpha
pass2 = re.sub(r'\s+',' ',pass1);       # collapses spaces to 1

来源：https://stackoverflow.com/questions/18752018/python-regex-for-words-single-space

标签

python

regex

string