Generate pure text using php

前端 未结 1 1365
花落未央
花落未央 2021-01-21 06:06

I\'m using a service that I end up with a generated string. Strings are usually like:

Hello   Mr   John Doe, you are now registered \\t.
Hello          


        
相关标签:
1条回答
  • 2021-01-21 06:32

    If I understood your case correctly, you basically want to convert from HTML to plain text.

    Depending on the complexity of your input and the robustness and accuracy needed, you have a couple of options:

    • Use strip_tags() to remove HTML tags, mb_convert_encoding() with HTML-ENTITIES as source encoding to decode entities and either strtr() or preg_replace() to make any additional replacement:

      $html = "<p>Hello &nbsp; Mr &nbsp; John Doe, you are now registered.
          Hello &nbsp; Mr &nbsp; John Doe, your phone number is &nbsp; 555-555-555 &nbsp;
          Test: &euro;/&eacute;</p>";
      
      $plain_text = $html;
      $plain_text = strip_tags($plain_text);
      $plain_text = mb_convert_encoding($plain_text, 'UTF-8', 'HTML-ENTITIES');
      $plain_text = strtr($plain_text, [
          "\t" => ' ',
          "\r" => ' ',
          "\n" => ' ',
      ]);
      $plain_text = preg_replace('/\s+/u', ' ', $plain_text);
      
      var_dump($html, $plain_text);
      
    • Use a proper DOM parser, plus maybe preg_replace() for further tweaking:

      $html = "<p>Hello &nbsp; Mr &nbsp; John Doe, you are now registered.
          Hello &nbsp; Mr &nbsp; John Doe, your phone number is &nbsp; 555-555-555 &nbsp;
          Test: &euro;/&eacute;</p>";
      
      $dom = new DOMDocument();
      libxml_use_internal_errors(true);
      $dom->loadHTML($html);
      libxml_use_internal_errors(false);
      $xpath = new DOMXPath($dom);
      
      $plain_text = '';
      foreach ($xpath->query('//text()') as $textNode) {
          $plain_text .= $textNode->nodeValue;
      }
      $plain_text = preg_replace('/\s+/u', ' ', $plain_text);
      
      var_dump($html, $plain_text);
      

    Both solutions should print something like this:

    string(169) "<p>Hello &nbsp; Mr &nbsp; John Doe, you are now registered.
        Hello &nbsp; Mr &nbsp; John Doe, your phone number is &nbsp; 555-555-555 &nbsp;
        Test: &euro;/&eacute;</p>"
    string(107) "Hello Mr John Doe, you are now registered. Hello Mr John Doe, your phone number is 555-555-555 Test: €/é"
    
    0 讨论(0)
提交回复
热议问题