I\'ve been confused. So here\'s my problem, I have a text like this :
Head of Pekalongan Regency , Dra. Hj.. Siti Qomariy
Did you try the strip_tags()
function?
<?php
$s = "<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.";
$r = strip_tags($s);
var_dump($r);
?>
demo
preg_match
will only return the first match, and your current code will fail if:
Instead, try this:
function get_text_between_tags($string, $tagname) {
$pattern = "/<$tagname\b[^>]*>(.*?)<\/$tagname>/is";
preg_match_all($pattern, $string, $matches);
if(!empty($matches[1]))
return $matches[1];
return array();
}
This is acceptable use of regexes for parsing, because it is a clearly-defined case. Note however that it will fail if, for whatever reason, there is a >
inside an attribute value of the tag.
If you prefer to avoid the wrath of the pony, try this:
function get_text_between_tags($string, $tagname) {
$dom = new DOMDocument();
$dom->loadHTML($string);
$tags = $dom->getElementsByTagName($tagname);
$out = array();
$length = $tags->length;
for( $i=0; $i<$length; $i++) $out[] = $tags->item($i)->nodeValue;
return $out;
}