I\'ve been searching for a method to deal with this stuff in Javascript.
PHP has a library that handles unicode character, which called Unicode character properties. Bas
JavaScript's only unicode feature is that it lets you match one characters with \uDDDD
, so if you need \P{L}
, no luck.
However, Steven Levithan, co-author of the excellent Regular Expressions Cookbook (together with regex guru Jan Goyvaert), has an alternate library called XRegExp that has many more features, including those you seek. You can test it both in RegexBuddy (a standalone app by Jan) and in RegexPal.
Quoting from the doc:
XRegExp supports matching Unicode categories, scripts, blocks, and other properties via addon scripts. Such tokens are matched using \p{…}, \P{…}, and \p{^…}.
See XRegExp Unicode add-ons.