RegEx for matching HTML tags with specific attributes [duplicate]

寵の児 提交于 2020-01-30 13:15:27

问题


I have a string like

<span title="use a <label>">Some Content</span>
<span title="use a <div>">Some Other Content</span>

I need a regex to get only the Some Content or Some Other Content ignoring the tags, even if the tags has another tags inside


回答1:


Use a document parser and DOM methods to get the content, not regular expressions. Regex is decidedly the wrong tool for this job. Even if you can get a regex that works, it will be difficult to understand and very brittle. The solution that follows is much more robust, easier to understand, and easier to debug.

Start by creating a parser and parsing the document fragment:

var parser = new DOMParser();
var doc = parser.parseFromString(
    '<span title="use a <label>">Some Content</label><span title="use a <div>">Some Other Content</label>',
    "text/html");

You can see the result by inspecting doc.documentElement, which gives us:

<html>
    <head></head>
    <body>
        <span title="use a <label>">
            Some Content
            <span title="use a <div>">
                Some Other Content
            </span>
        </span>
    </body>
</html>

Because your tags aren't closed properly, it parses it weird, but it doesn't matter. The text content is still content.

Next, we use a document walker to extract all of the text nodes. You can create a new walker using createTreeWalker, passing in NodeFilter.SHOW_TEXT:

var walker = doc.createTreeWalker(
    doc.documentElement,    // root
    NodeFilter.SHOW_TEXT,   // what to show
    null,                   // filter
    false);                 // reference expansion

We can then walk the tree and collect all of the walked nodes:

var node;
var textNodes = []; 
while (node = walker.nextNode()) {
    textNodes.push(node);
}

Finally, we get the desired array:

var content = textNodes.map(x => x.textContent);

Content is an array containing ["Some Content", "Some Other Content"], the desired result set.




回答2:


May be you get some idea.

Regex: ">(.*)</

Match 1
Full match  26-42   ">Some Content</
Group 1.    n/a Some Content
Match 2
Full match  73-95   ">Some Other Content</
Group 1.    n/a Some Other Content

https://regex101.com/r/6VArPY/1




回答3:


We might just use a simple expression and collect our desired textContents, maybe using:

">(.+?)<\/

Our data is saved in (.+?) capturing group.

const regex = /">(.+?)<\//gm;
const str = `<span title="use a <label>">Some Content</label>
<span title="use a <div>">Some Other Content</label>`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

DEMO

RegEx

If this expression wasn't desired, it can be modified or changed in regex101.com.

RegEx Circuit

jex.im visualizes regular expressions:



来源:https://stackoverflow.com/questions/56268174/regex-for-matching-html-tags-with-specific-attributes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!