问题
I have a string like
<span title="use a <label>">Some Content</span>
<span title="use a <div>">Some Other Content</span>
I need a regex to get only the Some Content
or Some Other Content
ignoring the tags, even if the tags has another tags inside
回答1:
Use a document parser and DOM methods to get the content, not regular expressions. Regex is decidedly the wrong tool for this job. Even if you can get a regex that works, it will be difficult to understand and very brittle. The solution that follows is much more robust, easier to understand, and easier to debug.
Start by creating a parser and parsing the document fragment:
var parser = new DOMParser();
var doc = parser.parseFromString(
'<span title="use a <label>">Some Content</label><span title="use a <div>">Some Other Content</label>',
"text/html");
You can see the result by inspecting doc.documentElement
, which gives us:
<html>
<head></head>
<body>
<span title="use a <label>">
Some Content
<span title="use a <div>">
Some Other Content
</span>
</span>
</body>
</html>
Because your tags aren't closed properly, it parses it weird, but it doesn't matter. The text content is still content.
Next, we use a document walker to extract all of the text nodes. You can create a new walker using createTreeWalker, passing in NodeFilter.SHOW_TEXT:
var walker = doc.createTreeWalker(
doc.documentElement, // root
NodeFilter.SHOW_TEXT, // what to show
null, // filter
false); // reference expansion
We can then walk the tree and collect all of the walked nodes:
var node;
var textNodes = [];
while (node = walker.nextNode()) {
textNodes.push(node);
}
Finally, we get the desired array:
var content = textNodes.map(x => x.textContent);
Content is an array containing ["Some Content", "Some Other Content"]
, the desired result set.
回答2:
May be you get some idea.
Regex: ">(.*)</
Match 1
Full match 26-42 ">Some Content</
Group 1. n/a Some Content
Match 2
Full match 73-95 ">Some Other Content</
Group 1. n/a Some Other Content
https://regex101.com/r/6VArPY/1
回答3:
We might just use a simple expression and collect our desired textContents, maybe using:
">(.+?)<\/
Our data is saved in (.+?)
capturing group.
const regex = /">(.+?)<\//gm;
const str = `<span title="use a <label>">Some Content</label>
<span title="use a <div>">Some Other Content</label>`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
DEMO
RegEx
If this expression wasn't desired, it can be modified or changed in regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:
来源:https://stackoverflow.com/questions/56268174/regex-for-matching-html-tags-with-specific-attributes