python: padding punctuation with white spaces (keeping punctuation)

♀尐吖头ヾ 提交于 2020-06-08 03:35:01

问题


What is an efficient way to pad punctuation with whitespace?

input:

s = 'bla. bla? bla.bla! bla...'

desired output:

 s = 'bla . bla ? bla . bla ! bla . . .'

Comments:

  1. I don't care how many whitespaces are there between tokens. (but they'll need to be collapsed eventually)
  2. I don't want to pad all punctuation. Say I'm interested only in .,!?().

回答1:


You can use a regular expression to match the punctuation characters you are interested and surround them by spaces, then use a second step to collapse multiple spaces anywhere in the document:

s = 'bla. bla? bla.bla! bla...'
import re
s = re.sub('([.,!?()])', r' \1 ', s)
s = re.sub('\s{2,}', ' ', s)
print(s)

Result:

bla . bla ? bla . bla ! bla . . .



回答2:


If you use python3, use the maketrans() function.

import string   
text = text.translate(str.maketrans({key: " {0} ".format(key) for key in string.punctuation}))



回答3:


This will add exactly one space if one is not present, and will not ruin existing spaces or other white-space characters:

s = re.sub('(?<! )(?=[.,!?()])|(?<=[.,!?()])(?! )', r' ', s)

This works by finding a zero-width position between a punctuation and a non-space, and adding a space there.
Note that is does add a space on the beginning or end of the string, but it can be easily done by changing the look-arounds to (?<=[^ ]) and (?=[^ ]).

See in in action: http://ideone.com/BRx7w



来源:https://stackoverflow.com/questions/3645931/python-padding-punctuation-with-white-spaces-keeping-punctuation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!