I got a problem with my parser. I want to read an image-link on a webiste and this normally works fine. But today I got a link that contains special chars and the usual regex di
You regex should be like:
String regex = "<img .*src=\"(.*?)\" .*>";
The .
character usually only matches everything except new line characters. Therefore, your pattern won't match if there are newlines in the img-tag.
Use Pattern.compile(..., Pattern.DOTALL)
or prepend your pattern with (?s)
.
In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html#DOTALL
This probably caused by the newline within the tag. The . character won't match it.
Did you consider not using regex to parse HTML? Using regex for HTML parsing is notoriously fragile construct. Please consider using a parsing library such as JSoup for this.
You should actually use <img\\s\\.*?\\bsrc=["'](\\.*?)["']\\.*?>
with (?s)
modifier.