Do I always need to escape metacharacters in a string that is not a “literal”?

后端 未结 4 1526
耶瑟儿~
耶瑟儿~ 2021-01-12 13:53

It seems that having a string that contains the characters { or } is rejected during regex processing. I can understand that these are reserved cha

相关标签:
4条回答
  • 2021-01-12 14:09

    Use Pattern.quote(String):

    public static String quote(String s)
    

    Returns a literal pattern String for the specified String.

    This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.

    Metacharacters or escape sequences in the input sequence will be given no special meaning.

    Parameters:
        s - The string to be literalized
    Returns:
        A literal string replacement
    Since:
        1.5

    0 讨论(0)
  • 2021-01-12 14:14

    You can use

    java.util.regex.Pattern.quote(java.lang.String)
    

    to escape meta characters used by regular expressions.

    0 讨论(0)
  • 2021-01-12 14:23

    TL;DR

    • if you need regex syntax use replaceAll or replaceFirst,
    • if you want your target/replacement pair to be treated as literals use replace (it also replaces all occurrences of your target).

    Most people are confused by unfortunate naming of replacing methods in String class which are:

    • replaceAll(String, String)
    • replaceFirst(String, String)
    • replace(CharSequence, CharSequence)
    • replace(char, char)

    Since replaceAll method explicitly claims that it replaces all posible targets, people assume that replace method doesn't doesn't guarantee such behaviour since it doesn't contain All suffix.
    But this assumption is wrong.

    Main difference between these methods is shown in this table:

    ╔═════════════════════╦═══════════════════════════════════════════════════════════════════╗
    ║                     ║                             replaced targets                      ║
    ║                     ╠════════════════════════════════════╦══════════════════════════════╣
    ║                     ║           ALL found                ║      ONLY FIRST found        ║
    ╠══════╦══════════════╬════════════════════════════════════╬══════════════════════════════╣
    ║      ║   supported  ║ replaceAll(String, String)         ║ replaceFirst(String, String) ║
    ║regex ╠══════════════╬════════════════════════════════════╬══════════════════════════════╣
    ║syntax║      not     ║ replace(CharSequence, CharSequence)║              \/              ║
    ║      ║   supported  ║ replace(char, char)                ║              /\              ║
    ╚══════╩══════════════╩════════════════════════════════════╩══════════════════════════════╝
    

    Now if you don't need to use regex syntax use method which doesn't expect it, but it treats target and replacement as literals.

    So instead of replaceAll(regex, replacement)

    use replace(literal, replacement).


    As you see there are two overloaded versions of replace. They both should work for you since they don't support regex syntax. Main difference between them is that:

    • replace(char target, char replacement) simply creates new string and fill it either with character from original string, or character you decided as replacement (depending if it was equal to target character)

    • replace(CharSequence target, CharSequence replacement) is essentially equivalent of replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement.toString()) which means that it is same as replaceAll but (which means it internally uses regex engine) but it escapes regex metacharacters used in target and replacement for us automatically

    0 讨论(0)
  • 2021-01-12 14:29

    You don't need any extra code, just the \Q and \E constructs, as documented in Java's Pattern class.

    For example, in the following code:

    String foobar = "crazyPassword=f()ob@r{}+";
    Pattern regex = Pattern.compile("\\Q" + foobar "\\E");
    

    the pattern would compile and foobar's special characters would not be interpreted as regex characters. See demo here.

    The only thing that it won't match is where the input contains a literal \E. If you need to solve that problem too, just let me know in a comment and I'll edit to add that.

    0 讨论(0)
提交回复
热议问题