What literal characters should be escaped in a regex?

前端 未结 5 1422
予麋鹿
予麋鹿 2020-11-30 01:47

I just wrote a regex for use with the php function preg_match that contains the following part:

[\\w-.]

To match any word char

相关标签:
5条回答
  • 2020-11-30 02:28
    [\w.-]
    
    • the . usually means any character but between [] has no special meaning
    • - between [] indicates a range unless if it's escaped or either first or last character between []
    0 讨论(0)
  • 2020-11-30 02:37

    While there are indeed some characters should be escaped in a regex, you're asking not about regex but about character class. Where dash symbol being special one.

    instead of escaping it you could put it at the end of class, [\w.-]

    0 讨论(0)
  • 2020-11-30 02:42

    The full stop loses its meta meaning in the character class.

    The - has special meaning in the character class. If it isn't placed at the start or at the end of the square brackets, it must be escaped. Otherwise it denotes a character range (A-Z).

    You triggered another special case however. [\w-.] works because \w does not denote a single character. As such PCRE can not possibly create a character range. \w is a possibly non-coherent class of symbols, so there is no end-character which could be used to create the range Z till .. Also the full stop . would preceed the first ascii character a that \w could match. There is no range constructable. Hencewhy - worked without escaping for you.

    0 讨论(0)
  • 2020-11-30 02:43

    In many regex implementations, the following rules apply:

    Meta characters inside a character class are:

    • ^ (negation)
    • - (range)
    • ] (end of the class)
    • \ (escape char)

    So these should all be escaped. There are some corner cases though:

    • - needs no escaping if placed at the very start, or end of the class ([abc-] or [-abc]). In quite a few regex implementations, it also needs no escaping when placed directly after a range ([a-c-abc]) or short-hand character class ([\w-abc]). This is what you observed
    • ^ needs no escaping when it's not at the start of the class: [^a] means any char except a, and [a^] matches either a or ^, which equals: [\^a]
    • ] needs no escaping if it's the only character in the class: []] matches the char ]
    0 讨论(0)
  • 2020-11-30 02:49

    If you are using php and you need to escape special regex chars, just use preg_quote:

    An example from php.net:

    <?php
    // In this example, preg_quote($word) is used to keep the
    // asterisks from having special meaning to the regular
    // expression.
    
    $textbody = "This book is *very* difficult to find.";
    $word = "*very*";
    $textbody = preg_replace ("/" . preg_quote($word, '/') . "/",
                              "<i>" . $word . "</i>",
                              $textbody);
    ?>
    
    0 讨论(0)
提交回复
热议问题