Split a string of HTML into an array by particular tags

后端 未结 4 744
遇见更好的自我
遇见更好的自我 2021-01-05 00:41

Given this HTML as a string \"html\", how can I split it into an array where each header marks the start of an element?

Begin with this:

相关标签:
4条回答
  • 2021-01-05 01:00

    In your example you can use:

    /
      <h   // Match literal <h
      (.)  // Match any character and save in a group
      >    // Match literal <
      .*?  // Match any character zero or more times, non greedy
      <\/h // Match literal </h
      \1   // Match what previous grouped in (.)
      >    // Match literal >
    /g
    
    var str = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>'
    str.match(/<h(.)>.*?<\/h\1>/g); // ["<h1>A</h1>", "<h2>B</h2>", "<h3>C</h3>"]
    

    But please don't parse HTML with regexp, read RegEx match open tags except XHTML self-contained tags

    0 讨论(0)
  • 2021-01-05 01:09

    I'm sure someone could reduce the for loop to put the angle brackets back in but this is how I'd do it.

    var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
    
    //split on ><
    var arr = html.split(/></g);
    
    //split removes the >< so we need to determine where to put them back in.
    for(var i = 0; i < arr.length; i++){
      if(arr[i].substring(0, 1) != '<'){
        arr[i] = '<' + arr[i];
      }
    
      if(arr[i].slice(-1) != '>'){
        arr[i] = arr[i] + '>';
      }
    }
    

    Additionally, we could actually remove the first and last bracket, do the split and then replace the angle brackets to the whole thing.

    var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
    
    //remove first and last characters
    html = html.substring(1, html.length-1);
    
    //do the split on ><
    var arr = html.split(/></g);
    
    //add the brackets back in
    for(var i = 0; i < arr.length; i++){
        arr[i] = '<' + arr[i] + '>';
    }
    

    Oh, of course this will fail with elements that have no content.

    0 讨论(0)
  • 2021-01-05 01:12

    Hi I used this function to convert html String Dom in array

      static getArrayTagsHtmlString(str){
        let htmlSplit = str.split(">")
        let arrayElements = []
        let nodeElement =""
        htmlSplit.forEach((element)=>{  
          if (element.includes("<")) {
            nodeElement = element+">"   
           }else{
             nodeElement = element
            }
            arrayElements.push(nodeElement)
        })
        return arrayElements
      }
    

    Happy code

    0 讨论(0)
  • 2021-01-05 01:14

    From the comments to the question, this seems to be the task:

    I'm taking dynamic markdown that I'm scraping from GitHub. Then I want to render it to HTML, but wrap every title element in a ReactJS <WayPoint> component.

    The following is a completely library-agnostic, DOM-API based solution.

    function waypointify(html) {
        var div = document.createElement("div"), nodes;
    
        // parse HTML and convert into an array (instead of NodeList)
        div.innerHTML = html;
        nodes = [].slice.call(div.childNodes);
    
        // add <waypoint> elements and distribute nodes by headings
        div.innerHTML = "";
        nodes.forEach(function (node) {
            if (!div.lastChild || /^h[1-6]$/i.test(node.nodeName)) {
                div.appendChild( document.createElement("waypoint") );
            }
            div.lastChild.appendChild(node);
        });
    
        return div.innerHTML;
    }
    

    Doing the same in a modern library with less lines of code is absolutely possible, see it as a challenge.

    This is what it produces with your sample input:

    <waypoint><h1>A</h1></waypoint>
    <waypoint><h2>B</h2><p>Foobar</p></waypoint>
    <waypoint><h3>C</h3></waypoint>
    
    0 讨论(0)
提交回复
热议问题