Generate pure text using php

前端 未结 1 1366
花落未央
花落未央 2021-01-21 06:06

I\'m using a service that I end up with a generated string. Strings are usually like:

Hello   Mr   John Doe, you are now registered \\t.
Hello          


        
1条回答
  •  不知归路
    2021-01-21 06:32

    If I understood your case correctly, you basically want to convert from HTML to plain text.

    Depending on the complexity of your input and the robustness and accuracy needed, you have a couple of options:

    • Use strip_tags() to remove HTML tags, mb_convert_encoding() with HTML-ENTITIES as source encoding to decode entities and either strtr() or preg_replace() to make any additional replacement:

      $html = "

      Hello   Mr   John Doe, you are now registered. Hello   Mr   John Doe, your phone number is   555-555-555   Test: €/é

      "; $plain_text = $html; $plain_text = strip_tags($plain_text); $plain_text = mb_convert_encoding($plain_text, 'UTF-8', 'HTML-ENTITIES'); $plain_text = strtr($plain_text, [ "\t" => ' ', "\r" => ' ', "\n" => ' ', ]); $plain_text = preg_replace('/\s+/u', ' ', $plain_text); var_dump($html, $plain_text);
    • Use a proper DOM parser, plus maybe preg_replace() for further tweaking:

      $html = "

      Hello   Mr   John Doe, you are now registered. Hello   Mr   John Doe, your phone number is   555-555-555   Test: €/é

      "; $dom = new DOMDocument(); libxml_use_internal_errors(true); $dom->loadHTML($html); libxml_use_internal_errors(false); $xpath = new DOMXPath($dom); $plain_text = ''; foreach ($xpath->query('//text()') as $textNode) { $plain_text .= $textNode->nodeValue; } $plain_text = preg_replace('/\s+/u', ' ', $plain_text); var_dump($html, $plain_text);

    Both solutions should print something like this:

    string(169) "

    Hello   Mr   John Doe, you are now registered. Hello   Mr   John Doe, your phone number is   555-555-555   Test: €/é

    " string(107) "Hello Mr John Doe, you are now registered. Hello Mr John Doe, your phone number is 555-555-555 Test: €/é"

    0 讨论(0)
提交回复
热议问题