问题
Why doesn't the \g<0>
work with unicode regex?
When I tried to use \g<0>
to insert a space before and after the group with normal string regex, it works:
>>> punct = """,.:;!@#$%^&*(){}{}|\/?><"'"""
>>> rx = re.compile('[%s]' % re.escape(punct))
>>> text = '''"anständig"'''
>>> rx.sub(r" \g<0> ",text)
' " anst\xc3\xa4ndig " '
>>> print rx.sub(r" \g<0> ",text)
" anständig "
but with unicode regex, the space isn't added:
>>> punct = u""",–−—’‘‚”“‟„!£"%$'&)(+*-€/.±°´·¸;:=<?>@§#¡•[˚]»_^`≤…\«¿¨{}|"""
>>> rx = re.compile("["+"".join(punct)+"]", re.UNICODE)
>>> text = """„anständig“"""
>>> rx.sub(ur" \g<0> ", text)
'\xe2\x80\x9eanst\xc3\xa4ndig\xe2\x80\x9c'
>>> print rx.sub(ur" \g<0> ", text)
„anständig“
- How do I get
\g
to work in unicode regex? - If (1) is not possible, how do I get the unicode regex input the space before and after a character in
punct
?
回答1:
I think you have two errors. First, you are not escaping punct
like in the first example with re.escape
and you have characters like []
that need to be escaped. And second, text
variable is not unicode. Example that works:
>>> punct = re.escape(u""",–−—’‘‚”“‟„!£"%$'&)(+*-€/.±°´·¸;:=<?>@§#¡•[˚]»_^`≤…\«¿¨{}|""")
>>> rx = re.compile("["+"".join(punct)+"]", re.UNICODE)
>>> text = u"""„anständig“"""
>>> print rx.sub(ur" \g<0> ", text)
„ anständig “
来源:https://stackoverflow.com/questions/19427548/unicode-re-sub-doesnt-work-with-g0-group-0