i need to validate a field for empty. But it should allow English and the Foreign languages characters(UTF-8) but not the special characters. I\'m not good at R
It would have been nice if I could say "Just do /^\w+$/.test(word)
", but...
See this answer for the current state of unicode support (or rather lack of) in JavaScript regular expressions.
You can either use the library he suggests, which might be slow or enlist the help of the server for this (which might be slower).
If you want to support a wide range of languages, you'll have to work by excluding only the characters you don't want, since specifying all of the ranges you do want will be difficult.
You'll need to look at the list of Unicode blocks and or the character database to identify the blocks you want to exclude (like, for instance, U+0000 through U+001F. This Wikipedia article may also help.
Then use a regular expression with character classes to look for what you want to exclude.
For example, this will check for the U+0000 through U+001F and the U+007F characters (obviously you'll be excluding more than just these):
if (/[\u0000-\u001F\u007F]/.exec(theString)) {
// Contains at least one invalid character
}
The []
identify a "character class" (list and/or range of characters to look for). That particular one says look for \u0000
through \u001F
(inclusive) as well as \u007F
.
You can test for a unicode letter like this:
str.match(/\p{L}/u)
Or for the existence of a non-letter like this:
str.match(/[^\p{L}]/u)