Java Regex doesn't work with special chars

后端未结

关注

 4  1801

I got a problem with my parser. I want to read an image-link on a webiste and this normally works fine. But today I got a link that contains special chars and the usual regex di

相关标签:

4条回答

臣服心动

2021-01-28 17:39
You regex should be like:
```
String regex = "<img .*src=\"(.*?)\" .*>";
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2021-01-28 17:43

The . character usually only matches everything except new line characters. Therefore, your pattern won't match if there are newlines in the img-tag.

Use Pattern.compile(..., Pattern.DOTALL) or prepend your pattern with (?s).

In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.

http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html#DOTALL

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2021-01-28 17:47

This probably caused by the newline within the tag. The . character won't match it.

Did you consider not using regex to parse HTML? Using regex for HTML parsing is notoriously fragile construct. Please consider using a parsing library such as JSoup for this.

0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2021-01-28 17:55

You should actually use <img\\s\\.*?\\bsrc=["'](\\.*?)["']\\.*?> with (?s) modifier.

0 讨论(0)
发布评论:

提交评论
- 加载中...