I know my question might look like a duplication for this question, but its not
I am trying to match a class name inside html text that comes from the
Regular expressions are not a good fit for parsing HTML. HTML is not regular.
jQuery can be a very good fit here.
var html = 'Your HTML here...';
$('<div>' + html + '</div>').find('[class~="b"]').each(function () {
console.log(this);
});
The selector [class~="b"]
will select any element that has a class
attribute containing the word b
. The initial HTML is wrapped inside a div
to make the find
method work properly.
This may not be a solution for you, but if you aren't set on using a full regex match, you could do (assuming your examples are representative of the data you will be parsing) :
function hasTheClass(html_string, classname) {
//!!~ turns -1 into false, and anything else into true.
return !!~html_string.split("=")[1].split(/[\'\"]/)[1].split(" ").indexOf(classname);
}
hasTheClass("<div class='a b c d'></div>", 'b'); //returns true
Test it here: https://regex101.com/r/vnOFjm/1
regexp: (?:class|className)=(?:["']\W+\s*(?:\w+)\()?["']([^'"]+)['"]
const regex = /(?:class|className)=(?:["']\W+\s*(?:\w+)\()?["']([^'"]+)['"]/gmi;
const str = `<div id="content" class="container">
<div style="overflow:hidden;margin-top:30px">
<div style="width:300px;height:250px;float:left">
<ins class="adsbygoogle turbo" style="display:inline-block !important;width:300px;min-height:250px; display: none !important;" data-ad-client="ca-pub-1904398025977193" data-ad-slot="4723729075" data-color-link="2244BB" qgdsrhu="" hidden=""></ins>
<img src="http://static.teleman.pl/images/pixel.gif?show,753804,20160812" alt="" width="0" height="0" hidden="" style="display: none !important;">
</div>`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Use the browser to your advantage:
var str = '<div class=\'a b c d\'></div>\
<!-- or -->\
<div class="a b c d"></div>\
<!-- There might be spaces after and before the = (the equal sign) -->';
var wrapper = document.createElement('div');
wrapper.innerHTML = str;
var elements = wrapper.getElementsByClassName('b');
if (elements.length) {
// there are elements with class b
}
Demo
Btw, getElementsByClassName()
is not very well supported in IE until version 9; check this answer for an alternative.
Using a regex, this pattern should work for you:
var r = new RegExp("(<\\w+?\\s+?class\\s*=\\s*['\"][^'\"]*?\\b)" + key + "\\b", "i");
# Λ Λ Λ
# |_________________________________________| |
# ____________| |
# [Creating a backreference] |
# [which will be accessible] [Using "i" makes the matching "case-insensitive".]_|
# [using $1 (see examples).] [You can omit "i" for case-sensitive matching. ]
E.g.
var oldClass = "b";
var newClass = "e";
var r = new RegExp("..." + oldClass + "...");
"<div class='a b c d'></div>".replace(r, "$1" + newClass);
// ^-- returns: <div class='a e c d'></div>
"<div class=\"a b c d\"></div>".replace(r, "$1" + newClass);
// ^-- returns: <div class="a e c d"></div>
"<div class='abcd'></div>".replace(r, "$1" + newClass);
// ^-- returns: <div class='abcd'></div> // <-- NO change
NOTE:
For the above regex to work there must be no '
or "
inside the class string.
I.e. <div class="a 'b' c d"...
will NOT match.