Python Regex for Words & single space

不想你离开。 提交于 2021-02-19 22:21:07

问题


I am using re.sub in order to forcibly convert a "bad" string into a "valid" string via regex. I am struggling with creating the right regex that will parse a string and "remove the bad parts". Specifically, I would like to force a string to be all alphabetical, and allow for a single space between words. Any values that disagree with this rule I would like to substitute with ''. This includes multiple spaces. Any help would be appreciated!

import re
list_of_strings = ["3He2l2lo Wo45rld!", "Hello World- -number two-", "Hello    World    number .. three"
for str in list_of_strings:
    print re.sub(r'[^A-Za-z]+([^\s][A-Za-z])*', '' , str)

I would like the output to be:

Hello World

Hello World number two

Hello World number three


回答1:


Try if the following works. It matches both groups of characters to remove, but only when there is at least an space in them subsitutes it with an space.

import re
list_of_strings = ["3He2l2lo Wo45rld!", "Hello World- -number two-", "Hello    World    number .. three"]
for str in list_of_strings:
    print(re.sub(r'((?:[^A-Za-z\s]|\s)+)', lambda x: ' ' if ' ' in x.group(0) else '' , str))

It yields:

Hello World
Hello World number two
Hello World number three



回答2:


I would prefer to have 2 passes to simplify the regex. First pass removes non-alphas, second removes multiple spaces.

pass1 = re.sub(r'[^A-Za-z\s]','',str)    # remove non-alpha
pass2 = re.sub(r'\s+',' ',pass1);       # collapses spaces to 1


来源:https://stackoverflow.com/questions/18752018/python-regex-for-words-single-space

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!