Concrete Javascript Regex for Accented Characters (Diacritics)

后端 未结 9 1017
庸人自扰
庸人自扰 2020-11-22 17:22

I\'ve looked on Stack Overflow (replacing characters.. eh, how JavaScript doesn\'t follow the Unicode standard concerning RegExp, etc.) and haven\'t really found a concrete

相关标签:
9条回答
  • 2020-11-22 17:43

    How about this?

    /^[a-zA-ZÀ-ÖØ-öø-ÿ]+$/
    
    0 讨论(0)
  • 2020-11-22 17:49
    /^[\pL\pM\p{Zs}.-]+$/u
    

    Explanation:

    • \pL - matches any kind of letter from any language
    • \pM - atches a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)
    • \p{Zs} - matches a whitespace character that is invisible, but does take up space
    • u - Pattern and subject strings are treated as UTF-8

    Unlike other proposed regex (such as [A-Za-zÀ-ÖØ-öø-ÿ]), this will work with all language specific characters, e.g. Šš is matched by this rule, but not matched by others on this page.

    Unfortunately, natively JavaScript does not support these classes. However, you can use xregexp, e.g.

    const XRegExp = require('xregexp');
    
    const isInputRealHumanName = (input: string): boolean => {
      return XRegExp('^[\\pL\\pM-]+ [\\pL\\pM-]+$', 'u').test(input);
    };
    
    
    0 讨论(0)
  • 2020-11-22 17:53

    The easier way to accept all accents is this:

    [A-zÀ-ú] // accepts lowercase and uppercase characters
    [A-zÀ-ÿ] // as above but including letters with an umlaut (includes [ ] ^ \ × ÷)
    [A-Za-zÀ-ÿ] // as above but not including [ ] ^ \
    [A-Za-zÀ-ÖØ-öø-ÿ] // as above but not including [ ] ^ \ × ÷
    

    See https://unicode-table.com/en/ for characters listed in numeric order.

    0 讨论(0)
提交回复
热议问题