Concrete Javascript Regex for Accented Characters (Diacritics)

后端未结

关注

 9  1017

I\'ve looked on Stack Overflow (replacing characters.. eh, how JavaScript doesn\'t follow the Unicode standard concerning RegExp, etc.) and haven\'t really found a concrete

相关标签:

9条回答

面向向阳花

2020-11-22 17:43
How about this?
```
/^[a-zA-ZÀ-ÖØ-öø-ÿ]+$/
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-11-22 17:49
```
/^[\pL\pM\p{Zs}.-]+$/u
```
Explanation:
- \pL - matches any kind of letter from any language
- \pM - atches a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)
- \p{Zs} - matches a whitespace character that is invisible, but does take up space
- u - Pattern and subject strings are treated as UTF-8
Unlike other proposed regex (such as [A-Za-zÀ-ÖØ-öø-ÿ]), this will work with all language specific characters, e.g. Šš is matched by this rule, but not matched by others on this page.

Unfortunately, natively JavaScript does not support these classes. However, you can use xregexp, e.g.
```
const XRegExp = require('xregexp');

const isInputRealHumanName = (input: string): boolean => {
  return XRegExp('^[\\pL\\pM-]+ [\\pL\\pM-]+$', 'u').test(input);
};
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

故里飘歌

2020-11-22 17:53

The easier way to accept all accents is this:

[A-zÀ-ú] // accepts lowercase and uppercase characters
[A-zÀ-ÿ] // as above but including letters with an umlaut (includes [ ] ^ \ × ÷)
[A-Za-zÀ-ÿ] // as above but not including [ ] ^ \
[A-Za-zÀ-ÖØ-öø-ÿ] // as above but not including [ ] ^ \ × ÷

See https://unicode-table.com/en/ for characters listed in numeric order.

0 讨论(0)

上一页 1 2