I\'m using a service that I end up with a generated string. Strings are usually like:
Hello Mr John Doe, you are now registered \\t.
Hello
If I understood your case correctly, you basically want to convert from HTML to plain text.
Depending on the complexity of your input and the robustness and accuracy needed, you have a couple of options:
Use strip_tags() to remove HTML tags, mb_convert_encoding() with HTML-ENTITIES
as source encoding to decode entities and either strtr() or preg_replace() to make any additional replacement:
$html = "Hello Mr John Doe, you are now registered.
Hello Mr John Doe, your phone number is 555-555-555
Test: €/é
";
$plain_text = $html;
$plain_text = strip_tags($plain_text);
$plain_text = mb_convert_encoding($plain_text, 'UTF-8', 'HTML-ENTITIES');
$plain_text = strtr($plain_text, [
"\t" => ' ',
"\r" => ' ',
"\n" => ' ',
]);
$plain_text = preg_replace('/\s+/u', ' ', $plain_text);
var_dump($html, $plain_text);
Use a proper DOM parser, plus maybe preg_replace()
for further tweaking:
$html = "Hello Mr John Doe, you are now registered.
Hello Mr John Doe, your phone number is 555-555-555
Test: €/é
";
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
$plain_text = '';
foreach ($xpath->query('//text()') as $textNode) {
$plain_text .= $textNode->nodeValue;
}
$plain_text = preg_replace('/\s+/u', ' ', $plain_text);
var_dump($html, $plain_text);
Both solutions should print something like this:
string(169) "Hello Mr John Doe, you are now registered.
Hello Mr John Doe, your phone number is 555-555-555
Test: €/é
"
string(107) "Hello Mr John Doe, you are now registered. Hello Mr John Doe, your phone number is 555-555-555 Test: €/é"