JavaScript + Unicode regexes

前端 未结 11 1108
星月不相逢
星月不相逢 2020-11-21 05:11

How can I use Unicode-aware regular expressions in JavaScript?

For example, there should be something akin to \\w that can match any code-point in Lette

11条回答
  •  梦如初夏
    2020-11-21 05:34

    Situation for ES 6

    The upcoming ECMAScript language specification, edition 6, includes Unicode-aware regular expressions. Support must be enabled with the u modifier on the regex. See Unicode-aware regular expressions in ES6.

    Until ES 6 is finished and widely adopted among browser vendors you're still on your own, though. Update: There is now a transpiler named regexpu that translates ES6 Unicode regular expressions into equivalent ES5. It can be used as part of your build process. Try it out online.

    Situation for ES 5 and below

    Even though JavaScript operates on Unicode strings, it does not implement Unicode-aware character classes and has no concept of POSIX character classes or Unicode blocks/sub-ranges.

    • Issues with Unicode in JavaScript regular expressions

    • Check your expectations here: Javascript RegExp Unicode Character Class tester (Edit: the original page is down, the Internet Archive still has a copy.)

    • Flagrant Badassery has an article on JavaScript, Regex, and Unicode that sheds some light on the matter.

    • Also read Regex and Unicode here on SO. Probably you have to build your own "punctuation character class".

    • Check out the Regular Expression: Match Unicode Block Range builder, which lets you build a JavaScript regular expression that matches characters that fall in any number of specified Unicode blocks.

      I just did it for the "General Punctuation" and "Supplemental Punctuation" sub-ranges, and the result is as simple and straight-forward as I would have expected it:

       [\u2000-\u206F\u2E00-\u2E7F]
      
    • There also is XRegExp, a project that brings Unicode support to JavaScript by offering an alternative regex engine with extended capabilities.

    • And of course, required reading: mathiasbynens.be - JavaScript has a Unicode problem:

提交回复
热议问题