Regex for all PRINTABLE characters

前端 未结 6 1332
囚心锁ツ
囚心锁ツ 2020-12-11 01:09

Is there a special regex statement like \\w that denotes all printable characters? I\'d like to validate that a string only contains a character that can be printed--i.e. do

相关标签:
6条回答
  • 2020-12-11 01:13

    There is a POSIX character class designation [:print:] that should match printable characters, and [:cntrl:] for control characters. Note that these match codes throughout the ASCII table, so they might not be suitable for matching other encodings.

    Failing that, the expression [\x00-\x1f] will match through the ASCII control characters, although again, these could be printable in other encodings.

    0 讨论(0)
  • If your regex flavor supports Unicode properties, this is probably the best the best way:

    \P{Cc}
    

    That matches any character that's not a control character, whether it be ASCII -- [\x00-\x1F\x7F] -- or Latin1 -- [\x80-\x9F] (also known as the C1 control characters).

    The problem with POSIX classes like [:print:] or \p{Print} is that they can match different things depending on the regex flavor and, possibly, the locale settings of the underlying platform. In Java, they're strictly ASCII-oriented. That means \p{Print} matches only the ASCII printing characters -- [\x20-\x7E] -- while \P{Cntrl} (note the capital 'P') matches everything that's not an ASCII control character -- [^\x00-\x1F\x7F]. That is, it matches any ASCII character that isn't a control character, or any non-ASCII character--including C1 control characters.

    0 讨论(0)
  • 2020-12-11 01:20

    It depends wildly on what regex package you are using. This is one of these situations about which some wag said that the great thing about standards is there are so many to choose from.

    If you happen to be using C, the isprint(3) function/macro is your friend.

    0 讨论(0)
  • 2020-12-11 01:20

    Adding on to @Alan-Moore, \P{Cc} is actually as example of Negative Unicode Category or Unicode Block (ref: Character Classes in Regular Expressions). \P{name} matches any character that does not belong to a Unicode general category or named block. See the referred link for more examples of named blocks supported in .Net

    0 讨论(0)
  • 2020-12-11 01:21

    Very late to the party, but this regexp works: /[ -~]/.

    How? It matches all characters in the range from space (ASCII DEC 32) to tilde (ASCII DEC 126), which is the range of all printable characters.

    If you want to strip non-ASCII characters, you could use something like:

    $someString.replace(/[^ -~]/g, '');
    

    NOTE: this is not valid .net code, but an example of regexp usage for those who stumble upon this via search engines later.

    0 讨论(0)
  • 2020-12-11 01:35

    In Java, the \p{Print} option specifies the printable character class.

    0 讨论(0)
提交回复
热议问题