RegEx for matching HTML tags with specific attributes [duplicate]

问题

I have a string like

<span title="use a <label>">Some Content</span>
<span title="use a <div>">Some Other Content</span>

I need a regex to get only the Some Content or Some Other Content ignoring the tags, even if the tags has another tags inside

回答1:

Use a document parser and DOM methods to get the content, not regular expressions. Regex is decidedly the wrong tool for this job. Even if you can get a regex that works, it will be difficult to understand and very brittle. The solution that follows is much more robust, easier to understand, and easier to debug.

Start by creating a parser and parsing the document fragment:

var parser = new DOMParser();
var doc = parser.parseFromString(
    '<span title="use a <label>">Some Content</label><span title="use a <div>">Some Other Content</label>',
    "text/html");

You can see the result by inspecting doc.documentElement, which gives us:

<html>
    <head></head>
    <body>
        <span title="use a <label>">
            Some Content
            <span title="use a <div>">
                Some Other Content
            </span>
        </span>
    </body>
</html>

Because your tags aren't closed properly, it parses it weird, but it doesn't matter. The text content is still content.

Next, we use a document walker to extract all of the text nodes. You can create a new walker using createTreeWalker, passing in NodeFilter.SHOW_TEXT:

var walker = doc.createTreeWalker(
    doc.documentElement,    // root
    NodeFilter.SHOW_TEXT,   // what to show
    null,                   // filter
    false);                 // reference expansion

We can then walk the tree and collect all of the walked nodes:

var node;
var textNodes = []; 
while (node = walker.nextNode()) {
    textNodes.push(node);
}

Finally, we get the desired array:

var content = textNodes.map(x => x.textContent);

Content is an array containing ["Some Content", "Some Other Content"], the desired result set.

回答2:

May be you get some idea.

Regex: ">(.*)</

Match 1
Full match  26-42   ">Some Content</
Group 1.    n/a Some Content
Match 2
Full match  73-95   ">Some Other Content</
Group 1.    n/a Some Other Content

https://regex101.com/r/6VArPY/1

回答3:

We might just use a simple expression and collect our desired textContents, maybe using:

">(.+?)<\/

Our data is saved in (.+?) capturing group.

const regex = /">(.+?)<\//gm;
const str = `<span title="use a <label>">Some Content</label>
<span title="use a <div>">Some Other Content</label>`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}