What is the regular expression for a Spanish word?

痴心易碎 提交于 2019-12-22 05:39:17

问题


Regular expression languages use \B to include A..Z, a..z, 0..9, and _, and \b is defined as a word boundary.

How can I write a regular expression that matches all valid Spanish words, including characters such as: á, í, ó, é, ñ, etc.?

I'm using .NET.


回答1:


Use a Spanish locale and make your regex locale-sensitive.




回答2:


Your regex system should have something equivalent to Python's re.L (aka re.LOCALE) to make a regex locale-dependent, so that what's a word-character and what isn't changes with locale, as do "word boundaries" etc. Are you instead asking for a way to compensate for some given regex system not supporting locale, trying to force the issue anyway...?




回答3:


This depends heavily on the language (and regex engine) you're using.

In Perl, \w matches all word characters, regardless of language or alphabet, and something like /\b(\w+)\b/ would (probably) match Spanish words as well as English words or Russian words.

In languages using PCRE, \w (and therefore probably \b) do NOT match Unicode characters. You will probably need to build your own set. I suggest something like [\wáéíóúñ] (matches all word characters, plus the accented characters you want), and the PCRE library has to be pre-built with Unicode support before this will even work.

If you're using something else, good luck. Some regex engines don't even support Unicode.



来源:https://stackoverflow.com/questions/896374/what-is-the-regular-expression-for-a-spanish-word

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!