Parse an HTML string with JS

后端 未结 10 1237
逝去的感伤
逝去的感伤 2020-11-21 07:17

I searched for a solution but nothing was relevant, so here is my problem:

I want to parse a string which contains HTML text. I want to do it in JavaScript.

相关标签:
10条回答
  • 2020-11-21 07:24
    const parse = Range.prototype.createContextualFragment.bind(document.createRange());
    
    document.body.appendChild( parse('<p><strong>Today is:</strong></p>') ),
    document.body.appendChild( parse(`<p style="background: #eee">${new Date()}</p>`) );
    


    Only valid child Nodes within the parent Node (start of the Range) will be parsed. Otherwise, unexpected results may occur:

    // <body> is "parent" Node, start of Range
    const parseRange = document.createRange();
    const parse = Range.prototype.createContextualFragment.bind(parseRange);
    
    // Returns Text "1 2" because td, tr, tbody are not valid children of <body>
    parse('<td>1</td> <td>2</td>');
    parse('<tr><td>1</td> <td>2</td></tr>');
    parse('<tbody><tr><td>1</td> <td>2</td></tr></tbody>');
    
    // Returns <table>, which is a valid child of <body>
    parse('<table> <td>1</td> <td>2</td> </table>');
    parse('<table> <tr> <td>1</td> <td>2</td> </tr> </table>');
    parse('<table> <tbody> <td>1</td> <td>2</td> </tbody> </table>');
    
    // <tr> is parent Node, start of Range
    parseRange.setStart(document.createElement('tr'), 0);
    
    // Returns [<td>, <td>] element array
    parse('<td>1</td> <td>2</td>');
    parse('<tr> <td>1</td> <td>2</td> </tr>');
    parse('<tbody> <td>1</td> <td>2</td> </tbody>');
    parse('<table> <td>1</td> <td>2</td> </table>');
    
    0 讨论(0)
  • 2020-11-21 07:25

    Create a dummy DOM element and add the string to it. Then, you can manipulate it like any DOM element.

    var el = document.createElement( 'html' );
    el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
    
    el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements
    

    Edit: adding a jQuery answer to please the fans!

    var el = $( '<div></div>' );
    el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");
    
    $('a', el) // All the anchor elements
    
    0 讨论(0)
  • 2020-11-21 07:26

    It's quite simple:

    var parser = new DOMParser();
    var htmlDoc = parser.parseFromString(txt, 'text/html');
    // do whatever you want with htmlDoc.getElementsByTagName('a');
    

    According to MDN, to do this in chrome you need to parse as XML like so:

    var parser = new DOMParser();
    var htmlDoc = parser.parseFromString(txt, 'text/xml');
    // do whatever you want with htmlDoc.getElementsByTagName('a');
    

    It is currently unsupported by webkit and you'd have to follow Florian's answer, and it is unknown to work in most cases on mobile browsers.

    Edit: Now widely supported

    0 讨论(0)
  • 2020-11-21 07:27

    with this simple code you can do that:

    let el = $('<div></div>');
    $(document.body).append(el);
    el.html(`<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>`);
    console.log(el.find('a[href="test0"]'));
    
    0 讨论(0)
  • 2020-11-21 07:28

    If you're open to using jQuery, it has some nice facilities for creating detached DOM elements from strings of HTML. These can then be queried through the usual means, E.g.:

    var html = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
    var anchors = $('<div/>').append(html).find('a').get();
    

    Edit - just saw @Florian's answer which is correct. This is basically exactly what he said, but with jQuery.

    0 讨论(0)
  • 2020-11-21 07:30

    The following function parseHTML will return either :

    • a Document when your file starts with a doctype.

    • a DocumentFragment when your file doesn't start with a doctype.


    The code :

    function parseHTML(markup) {
        if (markup.toLowerCase().trim().indexOf('<!doctype') === 0) {
            var doc = document.implementation.createHTMLDocument("");
            doc.documentElement.innerHTML = markup;
            return doc;
        } else if ('content' in document.createElement('template')) {
           // Template tag exists!
           var el = document.createElement('template');
           el.innerHTML = markup;
           return el.content;
        } else {
           // Template tag doesn't exist!
           var docfrag = document.createDocumentFragment();
           var el = document.createElement('body');
           el.innerHTML = markup;
           for (i = 0; 0 < el.childNodes.length;) {
               docfrag.appendChild(el.childNodes[i]);
           }
           return docfrag;
        }
    }
    

    How to use :

    var links = parseHTML('<!doctype html><html><head></head><body><a>Link 1</a><a>Link 2</a></body></html>').getElementsByTagName('a');
    
    0 讨论(0)
提交回复
热议问题