Strip HTML from Text JavaScript

前端 未结 30 3552
北荒
北荒 2020-11-21 05:08

Is there an easy way to take a string of html in JavaScript and strip out the html?

相关标签:
30条回答
  • 2020-11-21 05:15

    For easier solution, try this => https://css-tricks.com/snippets/javascript/strip-html-tags-in-javascript/

    var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
    
    0 讨论(0)
  • 2020-11-21 05:16

    You can safely strip html tags using the iframe sandbox attribute.

    The idea here is that instead of trying to regex our string, we take advantage of the browser's native parser by injecting the text into a DOM element and then querying the textContent/innerText property of that element.

    The best suited element in which to inject our text is a sandboxed iframe, that way we can prevent any arbitrary code execution (Also known as XSS).

    The downside of this approach is that it only works in browsers.

    Here's what I came up with (Not battle-tested):

    const stripHtmlTags = (() => {
      const sandbox = document.createElement("iframe");
      sandbox.sandbox = "allow-same-origin"; // <--- This is the key
      sandbox.style.setProperty("display", "none", "important");
    
      // Inject the sanbox in the current document
      document.body.appendChild(sandbox);
    
      // Get the sandbox's context
      const sanboxContext = sandbox.contentWindow.document;
    
      return (untrustedString) => {
        if (typeof untrustedString !== "string") return ""; 
    
        // Write the untrusted string in the iframe's body
        sanboxContext.open();
        sanboxContext.write(untrustedString);
        sanboxContext.close();
    
        // Get the string without html
        return sanboxContext.body.textContent || sanboxContext.body.innerText || "";
      };
    })();
    

    Usage (demo):

    console.log(stripHtmlTags(`<img onerror='alert("could run arbitrary JS here")' src='bogus'>XSS injection :)`));
    console.log(stripHtmlTags(`<script>alert("awdawd");</` + `script>Script tag injection :)`));
    console.log(stripHtmlTags(`<strong>I am bold text</strong>`));
    console.log(stripHtmlTags(`<html>I'm a HTML tag</html>`));
    console.log(stripHtmlTags(`<body>I'm a body tag</body>`));
    console.log(stripHtmlTags(`<head>I'm a head tag</head>`));
    console.log(stripHtmlTags(null));
    
    0 讨论(0)
  • 2020-11-21 05:19

    Below code allows you to retain some html tags while stripping all others

    function strip_tags(input, allowed) {
    
      allowed = (((allowed || '') + '')
        .toLowerCase()
        .match(/<[a-z][a-z0-9]*>/g) || [])
        .join(''); // making sure the allowed arg is a string containing only tags in lowercase (<a><b><c>)
    
      var tags = /<\/?([a-z][a-z0-9]*)\b[^>]*>/gi,
          commentsAndPhpTags = /<!--[\s\S]*?-->|<\?(?:php)?[\s\S]*?\?>/gi;
    
      return input.replace(commentsAndPhpTags, '')
          .replace(tags, function($0, $1) {
              return allowed.indexOf('<' + $1.toLowerCase() + '>') > -1 ? $0 : '';
          });
    }
    
    0 讨论(0)
  • 2020-11-21 05:21

    Using Jquery:

    function stripTags() {
        return $('<p></p>').html(textToEscape).text()
    }
    
    0 讨论(0)
  • 2020-11-21 05:22

    This should do the work on any Javascript environment (NodeJS included).

    const text = `
    <html lang="en">
      <head>
        <style type="text/css">*{color:red}</style>
        <script>alert('hello')</script>
      </head>
      <body><b>This is some text</b><br/><body>
    </html>`;
    
    // Remove style tags and content
    text.replace(/<style[^>]*>.*<\/style>/gm, '')
        // Remove script tags and content
        .replace(/<script[^>]*>.*<\/script>/gm, '')
        // Remove all opening, closing and orphan HTML tags
        .replace(/<[^>]+>/gm, '')
        // Remove leading spaces and repeated CR/LF
        .replace(/([\r\n]+ +)+/gm, '');
    
    0 讨论(0)
  • 2020-11-21 05:23

    I made some modifications to original Jibberboy2000 script Hope it'll be usefull for someone

    str = '**ANY HTML CONTENT HERE**';
    
    str=str.replace(/<\s*br\/*>/gi, "\n");
    str=str.replace(/<\s*a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
    str=str.replace(/<\s*\/*.+?>/ig, "\n");
    str=str.replace(/ {2,}/gi, " ");
    str=str.replace(/\n+\s*/gi, "\n\n");
    
    0 讨论(0)
提交回复
热议问题