How can I remove attributes from an html tag?

前端 未结 6 408
醉梦人生
醉梦人生 2020-11-29 09:52

How can I use php to strip all/any attributes from a tag, say a paragraph tag?

to

相关标签:
6条回答
  • 2020-11-29 09:57

    You might also look into html purifier. True, it's quite bloated, and might not fit your needs if it only conceirns this specific example, but it offers more or less 'bulletproof' purification of possible hostile html. Also you can choose to allow or disallow certain attributes (it's highly configurable).

    http://htmlpurifier.org/

    0 讨论(0)
  • 2020-11-29 09:58

    I honestly think that the only sane way to do this is to use a tag and attribute whitelist with the HTML Purifier library. Example script here:

    <html><body>
    
    <?php
    
    require_once '../includes/htmlpurifier-4.5.0-lite/library/HTMLPurifier/Bootstrap.php';
    spl_autoload_register(array('HTMLPurifier_Bootstrap', 'autoload'));
    
    $config = HTMLPurifier_Config::createDefault();
    $config->set('HTML.Allowed', 'p,b,a[href],i,br,img[src]');
    $config->set('URI.Base', 'http://www.example.com');
    $config->set('URI.MakeAbsolute', true);
    
    $purifier = new HTMLPurifier($config);
    
    $dirty_html = "
      <a href=\"http://www.google.de\">broken a href link</a
      fnord
    
      <x>y</z>
      <b>c</p>
      <script>alert(\"foo!\");</script>
    
      <a href=\"javascript:alert(history.length)\">Anzahl besuchter Seiten</a>
      <img src=\"www.example.com/bla.gif\" />
      <a href=\"http://www.google.de\">missing end tag
     ende 
    ";
    
    $clean_html = $purifier->purify($dirty_html);
    
    print "<h1>dirty</h1>";
    print "<pre>" . htmlentities($dirty_html) . "</pre>";
    
    print "<h1>clean</h1>";
    print "<pre>" . htmlentities($clean_html) . "</pre>";
    
    ?>
    
    </body></html>
    

    This yields the following clean, standards-conforming HTML fragment:

    <a href="http://www.google.de">broken a href link</a>fnord
    
    y
    <b>c
    <a>Anzahl besuchter Seiten</a>
    <img src="http://www.example.com/www.example.com/bla.gif" alt="bla.gif" /><a href="http://www.google.de">missing end tag
    ende 
    </a></b>
    

    In your case the whitelist would be:

    $config->set('HTML.Allowed', 'p');
    
    0 讨论(0)
  • 2020-11-29 10:03

    Although there are better ways, you could actually strip arguments from html tags with a regular expression:

    <?php
    function stripArgumentFromTags( $htmlString ) {
        $regEx = '/([^<]*<\s*[a-z](?:[0-9]|[a-z]{0,9}))(?:(?:\s*[a-z\-]{2,14}\s*=\s*(?:"[^"]*"|\'[^\']*\'))*)(\s*\/?>[^<]*)/i'; // match any start tag
    
        $chunks = preg_split($regEx, $htmlString, -1,  PREG_SPLIT_DELIM_CAPTURE);
        $chunkCount = count($chunks);
    
        $strippedString = '';
        for ($n = 1; $n < $chunkCount; $n++) {
            $strippedString .= $chunks[$n];
        }
    
        return $strippedString;
    }
    ?>
    

    The above could probably be written in less characters, but it does the job (quick and dirty).

    0 讨论(0)
  • 2020-11-29 10:03

    HTML Purifier is one of the better tools for sanitizing HTML with PHP.

    0 讨论(0)
  • 2020-11-29 10:06

    Strip attributes using SimpleXML (Standard in PHP5)

    <?php
    
    // define allowable tags
    $allowable_tags = '<p><a><img><ul><ol><li><table><thead><tbody><tr><th><td>';
    // define allowable attributes
    $allowable_atts = array('href','src','alt');
    
    // strip collector
    $strip_arr = array();
    
    // load XHTML with SimpleXML
    $data_sxml = simplexml_load_string('<root>'. $data_str .'</root>', 'SimpleXMLElement', LIBXML_NOERROR | LIBXML_NOXMLDECL);
    
    if ($data_sxml ) {
        // loop all elements with an attribute
        foreach ($data_sxml->xpath('descendant::*[@*]') as $tag) {
            // loop attributes
            foreach ($tag->attributes() as $name=>$value) {
                // check for allowable attributes
                if (!in_array($name, $allowable_atts)) {
                    // set attribute value to empty string
                    $tag->attributes()->$name = '';
                    // collect attribute patterns to be stripped
                    $strip_arr[$name] = '/ '. $name .'=""/';
                }
            }
        }
    }
    
    // strip unallowed attributes and root tag
    $data_str = strip_tags(preg_replace($strip_arr,array(''),$data_sxml->asXML()), $allowable_tags);
    
    ?>
    
    0 讨论(0)
  • 2020-11-29 10:09

    Here is one function that will let you strip all attributes except ones you want:

    function stripAttributes($s, $allowedattr = array()) {
      if (preg_match_all("/<[^>]*\\s([^>]*)\\/*>/msiU", $s, $res, PREG_SET_ORDER)) {
       foreach ($res as $r) {
         $tag = $r[0];
         $attrs = array();
         preg_match_all("/\\s.*=(['\"]).*\\1/msiU", " " . $r[1], $split, PREG_SET_ORDER);
         foreach ($split as $spl) {
          $attrs[] = $spl[0];
         }
         $newattrs = array();
         foreach ($attrs as $a) {
          $tmp = explode("=", $a);
          if (trim($a) != "" && (!isset($tmp[1]) || (trim($tmp[0]) != "" && !in_array(strtolower(trim($tmp[0])), $allowedattr)))) {
    
          } else {
              $newattrs[] = $a;
          }
         }
         $attrs = implode(" ", $newattrs);
         $rpl = str_replace($r[1], $attrs, $tag);
         $s = str_replace($tag, $rpl, $s);
       }
      }
      return $s;
    }
    

    In example it would be:

    echo stripAttributes('<p class="one" otherrandomattribute="two">');
    

    or if you eg. want to keep "class" attribute:

    echo stripAttributes('<p class="one" otherrandomattribute="two">', array('class'));
    

    Or

    Assuming you are to send a message to an inbox and you composed your message with CKEDITOR, you can assign the function as follows and echo it to the $message variable before sending. Note the function with the name stripAttributes() will strip off all html tags that are unnecessary. I tried it and it work fine. i only saw the formatting i added like bold e.t.c.

    $message = stripAttributes($_POST['message']);
    

    or you can echo $message; for preview.

    0 讨论(0)
提交回复
热议问题