.net Regular Expression to match any kind of letter from any language

问题

Which regular expression can I use to match (allow) any kind of letter from any language

I need to match any letter including any diacritics (e.g. á, ü, ñ, etc.) and exclude any kind of symbol (math symbols, currency signs, dingbats, box-drawing characters, etc.) and punctuation characters.

I'm using asp.net MVC 2 with .net 4. I've tried this annotation in my view model:

[RegularExpression(@"\p{L}*", ...

and this one:

[RegularExpression(@"\p{L}\p{M}*", ...

but client side validation does not work.

UPDATE: Thank you for all your answers, your suggestions work but only for .net and the problem here is that it also uses the regex for client side validation with JavaScript (sorry if this was not clear enough). I had to go with:

[^0-9_\|°¬!#\$%/\()\?¡¿+{}[]:.\,;@ª^*<>=&]*

which is very ugly and does not cover all scenarios but is the closest thing to what I need.

回答1:

One thing to watch out for is the client-side regex. It uses javascript regex on the client side and .net regex on the server side. Javascript won't support this scenario.

回答2:

Ignore your grammar teacher and use double-negatives:

[^\W\d_]

Remember that \w matches any letter, digit, or underscore, so exclude them as above. You might read it as “not not-a-word-character, not a digit, and not an underscore” — which leaves only letters. Apply DeMorgan's theorem, and it makes more sense: “a word-character but neither a digit nor an underscore.”

回答3:

You can use Char.IsLetter:

Indicates whether the specified Unicode character is categorized as a Unicode letter.

With .Net 4.0:

string onlyLetters = String.Concat(str.Where(Char.IsLetter));

On 3.5 String.Concat only excepts an array, so you should also call ToArray.

回答4:

Your problem is more likely to the fact that you will only have to have one alpha-char, because the regex will match anything that has at least one char.

By adding ^ as prefix and $ as postfix, the whole sentence should comply to your regex. So this prob works:

^\p{L}*$

Regexbuddy explains:

^ Assert position at beginning of the string
\p{L} A character with the Unicode property 'letter' (any kind of letter from any kind of language) 2a. Between zero and unlimited times, as many as possible (greedy)
$ Assert position at the end of the string

回答5:

\p{L}* should match "any kind of letter from any language". It should work, I used it in a i18n-proof uppercase/lowercase recognition regex in .NET.

回答6:

I’ve just had to validate a URL and I chose this regular expression in .NET.

^[(\p{L})?(\p{M})?-]*$

Begin and end with a character of any language (optionally either letters or marks) and allow hyphens.

回答7:

\w - matches any alphanumeric character (including numbers)

In my tests it has matched:

and hasn't matched:

;
,
\
:

In case you know exactly what you want to exclude (like a little list) you cand do the following:

[^;,\`.]

which matches one time any character that isnt:

;
,
\
`
.

Hope it helps!

回答8:

Set Regex option to none-greedy(lazy).

/\p{L}/u

来源：https://stackoverflow.com/questions/2949861/net-regular-expression-to-match-any-kind-of-letter-from-any-language

标签

.net

regex

unicode

asp.net-mvc-2

data-annotations