I am looking for a regular expression that can get me src (case insensitive) tag from following HTML snippets in java.
This answer is for google searchers, Because it's too late
Copying cletus's showed error and
Modifying his answer and passing modified String src\\s*=\\s*([\"'])?([^\"']*)
as parameter passed into Pattern.compile
worked for me,
Here is the full example
String htmlString = "<div class=\"current\"><img src=\"img/HomePageImages/Paris.jpg\"></div>"; //Sample HTML
String ptr= "src\\s*=\\s*([\"'])?([^\"']*)";
Pattern p = Pattern.compile(ptr);
Matcher m = p.matcher(htmlString);
if (m.find()) {
String src = m.group(2); //Result
}
One possibility:
String imgRegex = "<img[^>]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>";
is a possibility (if matched case-insensitively). It's a bit of a mess, and deliberately ignores the case where quotes aren't used. To represent it without worrying about string escapes:
<img[^>]+src\s*=\s*['"]([^'"]+)['"][^>]*>
This matches:
<img
>
(i.e. possible other attributes)src
=
'
or "
>
(more possible attributes)>
to close the tagThings to note:
src=
as well, move the open bracket further left :-)>
or image sources that include '
or "
).This question comes up a lot here.
Regular expressions are a bad way of handling this problem. Do yourself a favour and use an HTML parser of some kind.
Regexes are flaky for parsing HTML. You'll end up with a complicated expression that'll behave unexpectedly in some corner cases that will happen otherwise.
Edit: If your HTML is that simple then:
Pattern p = Pattern.compile("src\\s*=\\s*([\\"'])?([^ \\"']*)");
Matcher m = p.matcher(str);
if (m.find()) {
String src = m.group(2);
}
And there are any number of Java HTML parsers out there.
You mean the src-attribute of the img-Tag? In that case you can go with the following:
<[Ii][Mm][Gg]\\s*([Ss][Rr][Cc]\\s*=\\s*[\"'].*?[\"'])
That should work. The expression src='...' is in parantheses, so it is a matcher-group and can be processed separately.