Regex to match all unicode quotation marks

问题

Is there a simple regular expression to match all unicode quotes? Or does one have to hand-code it like this:

quotes = ur"[\"'\u2018\u2019\u201c\u201d]"

Thank you for reading.

Brian

回答1:

Python doesn't support Unicode properties, therefore you can't use the Pi and Pf properties, so I guess your solution is as good as it gets.

You might also want to consider the "false quotation marks" that are sadly being used - the acute and grave accent (´ and `` ):\u0060and\u00B4`.

Then there are guillemets (« » ‹ ›), do you want those, too? Use \u00BB\u203A\u00AB\u2039 for those.

Also, your command has a little bug: you're adding the backslash to the quotes string (because you're using a raw string). Use a triple-quoted string instead.

>>> quotes = ur"[\"'\u2018\u2019\u201c\u201d\u0060\u00b4]"
>>> "\\" in quotes
True
>>> quotes
u'[\\"\'\u2018\u2019\u201c\u201d`\xb4]'
>>> quotes = ur"""["'\u2018\u2019\u201c\u201d\u0060\u00b4]"""
>>> "\\" in quotes
False
>>> quotes
u'["\'\u2018\u2019\u201c\u201d`\xb4]'

回答2:

Quotation marks will often have the Unicode category Pi (punctuation, initial quote) or Pf (Punctuation, final quote). You'll have to handle the "neutral" quotation marks ' and " manually.

来源：https://stackoverflow.com/questions/3128890/regex-to-match-all-unicode-quotation-marks

标签

regex

unicode

quotes

character-properties

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!