Regular expression for all printable characters in JavaScript

前端 未结 4 1667
时光说笑
时光说笑 2020-11-27 20:33

Looking for a regular expression for that validates all printable characters. The regex needs to be used in JavaScript only. I have gone through this post but it mostly talk

相关标签:
4条回答
  • 2020-11-27 20:42

    Looks like JavaScript has changed to some degree since this question was posted?

    I'm using this one:

    var regex = /^[\u0020-\u007e\u00a0-\u00ff]*$/;
    console.log( regex.test("!\"#$%&'()*+,-./:;<=>?@[] ^_`{|}~")); //should output "true" 
    console.log( regex.test("Iñtërnâtiônàlizætiøn")); //should output "true"
    console.log( regex.test("☃                                                                    
    0 讨论(0)
  • 2020-11-27 20:48

    For non-unicode use regex pattern ^[^\x00-\x1F\x80-\x9F]+$


    If you want to work with unicode, first read Javascript + Unicode regexes.

    I would suggest then to use regex pattern ^[^\p{Cc}\p{Cf}\p{Zl}\p{Zp}]*$

    • \p{Cc} or \p{Control}: an ASCII 0x00..0x1F or Latin-1 0x80..0x9F control character.
    • \p{Cf} or \p{Format}: invisible formatting indicator.
    • \p{Zl} or \p{Line_Separator}: line separator character U+2028.
    • \p{Zp} or \p{Paragraph_Separator}: paragraph separator character U+2029.

    For more information see http://www.regular-expressions.info/unicode.html

    0 讨论(0)
  • 2020-11-27 21:02

    To validate a string only consists of printable ASCII characters, use a simple regex like

    /^[ -~]+$/
    

    It matches

    • ^ - the start of string anchor
    • [ -~]+ - one or more (due to + quantifier) characters that are within a range from space till a tilde in the ASCII table:


    - $ - end of string anchor

    For Unicode printable chars, use \PC Unicode category (matching any char but a control char) from XRegExp, as has already been mentioned:

    ^\PC+$
    

    See regex demos:

    // ASCII only
    var ascii_print_rx = /^[ -~]+$/;
    console.log(ascii_print_rx.test("It's all right.")); // true
    console.log(ascii_print_rx.test('\f ')); // false, \f is an ASCII form feed char
    console.log(ascii_print_rx.test("demásiado tarde")); // false, no Unicode printable char support
    // Unicode support
    console.log(XRegExp.test('demásiado tarde', XRegExp("^\\PC+$"))); // true
    console.log(XRegExp.test('‌ ', XRegExp("^\\PC+$"))); // false, \u200C is a Unicode zero-width joiner
    console.log(XRegExp.test('\f ', XRegExp("^\\PC+$"))); // false, \f is an ASCII form feed char
    <script src="http://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>

    0 讨论(0)
  • 2020-11-27 21:06

    If you want to match all printable characters in the UTF-8 set (as indicated by your comment on Aug 21), you're going to have a hard time doing this yourself. JavaScript's native regexes have abysmal Unicode support. But you can use XRegExp with the regex ^\P{C}*$.

    If you only want to match those few ASCII letters you mentioned in the edit to your post from Aug 22, then the regex is trivial:

    /^[a-z0-9!"#$%&'()*+,.\/:;<=>?@\[\] ^_`{|}~-]*$/i
    
    0 讨论(0)
提交回复
热议问题