Java Regex doesn't work with special chars

后端 未结 4 1794
一个人的身影
一个人的身影 2021-01-28 17:11

I got a problem with my parser. I want to read an image-link on a webiste and this normally works fine. But today I got a link that contains special chars and the usual regex di

相关标签:
4条回答
  • 2021-01-28 17:39

    You regex should be like:

    String regex = "<img .*src=\"(.*?)\" .*>";
    
    0 讨论(0)
  • 2021-01-28 17:43

    The . character usually only matches everything except new line characters. Therefore, your pattern won't match if there are newlines in the img-tag.

    Use Pattern.compile(..., Pattern.DOTALL) or prepend your pattern with (?s).

    In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.

    http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html#DOTALL

    0 讨论(0)
  • 2021-01-28 17:47

    This probably caused by the newline within the tag. The . character won't match it.

    Did you consider not using regex to parse HTML? Using regex for HTML parsing is notoriously fragile construct. Please consider using a parsing library such as JSoup for this.

    0 讨论(0)
  • 2021-01-28 17:55

    You should actually use <img\\s\\.*?\\bsrc=["'](\\.*?)["']\\.*?> with (?s) modifier.

    0 讨论(0)
提交回复
热议问题