I have string:
Verslo centrai Lietuvos nekilnojamojo turto plėtros asociacijos konkurse ...
This will remove every thing - tags, ascii, line breaks but pure text:
strip_tags(preg_replace('/<[^>]*>/','',str_replace(array(" ","\n","\r"),"",html_entity_decode($YOUR_STRING,ENT_QUOTES,'UTF-8'))));
From PHP 7.4.0 the strip_tags() alternatively accepts an array with allowable tags,
then this:
<?php
$html = '<div id="my-div"><p>text<strong><a href="#link"></a></strong></p></div>';
echo strip_tags($html, ['p', 'a']); //accept p and a tags
Return this:
<p>text<a href="#link"></a></p>
Note that only the disallowed tags have been removed.
Since your HTML is not properly formatted you could choose a preg_replace()
approach:
$text = '<p justify;"="">Verslo centrai Lietuvos nekilnojamojo turto plėtros asociacijos konkurse ... </p>';
$content = preg_replace('/<[^>]*>/', '', $text);
var_dump($content);
// string(108) "Verslo centrai Lietuvos nekilnojamojo turto plėtros asociacijos konkurse ... "
Codepad Example
On strip_tags() docs it says: Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected.
Also second parameter is for $allowable_tags
.
This will replace all html tags, https://regex101.com/r/jM9oS4/4
preg_replace('/<(|\/)(?!\?).*?(|\/)>/',$replacement,$string);
Try to put it like that
$content = strip_tags($text);
Or you can do it with regular expression like that:
$content = preg_replace('/<[^>]*>/', '', $text);
By this $content = strip_tags($text, '<p>');
you are allowing the <p>
tag in the string.
For more info see the link http://php.net/manual/en/function.strip-tags.php
Since the HTML is poorly formated you probably need to either write your own regexp to remove tags or clean up the HTML before trying to remove tags.
You could try this to remove everything that "looks like" a tag:
$str = preg_replace("/<.*?>/", " ", $str);