Non greedy regex | 易学教程

I need to get the value inside some tags in a comment php file like this

php code
/* this is a comment
!-
<titulo>titulo3</titulo>
<funcion>
   <descripcion>esta es la descripcion de la funcion 6</descripcion>
</funcion>
<funcion>
   <descripcion>esta es la descripcion de la funcion 7</descripcion>
</funcion>
<otros>
   <descripcion>comentario de otros 2a hoja</descripcion>
</otros>
-!
*/
some php code

so as you can see the file has newlines and repetions of tags like <funcion></funcion> and i need to get every single one of the tags, so i was trying something like this:

preg_match_all("/(<funcion>)(.*)(<\/funcion>)/s",$file,$matches);

this example works with the newlines but its greedy so i've been searching and seen these two solutions:

preg_match_all("/(<funcion>)(.*?)(<\/funcion>)/s",$file,$matches);
preg_match_all("/(<funcion>)(.*)(<\/funcion>)/sU",$file,$matches);

but none of them work for me, don't know why

Try using [\s\S], which means all space and non-space characters, instead of .. Also, there's no need to add <funcion> and </funcion> in match groups.

/<funcion>([\s\S]*?)<\/funcion>/s

Also, keep in mind that the best way to do this is parsing the XML using a XML parser. Even if it's not a XML document, as you mentioned on your comment, extract the part that should be parsed and use XML parser to parse it.

This expression from your question:

preg_match_all("/(<funcion>)(.*?)(<\/funcion>)/s", $file, $matches);
print_r($matches);

This will work, but ONLY IF $file is a string containing the XML; if it's a file name, you have to get the contents first:

preg_match_all("/(<funcion>)(.*?)(<\/funcion>)/s", file_get_contents($file), $matches);

Also, keep in mind that PCRE has backtrack limitations when you use non-greedy patterns.

Try this..

 /<funcion>((.|\n)*?)<\/funcion>/i

$srting = "<titulo>titulo3</titulo>
<funcion>
   <descripcion>esta es la descripcion de la funcion 6</descripcion>
</funcion>
<funcion>
   <descripcion>esta es la descripcion de la funcion 7</descripcion>
</funcion>
<otros>
   <descripcion>comentario de otros 2a hoja</descripcion>
</otros>";

$result=preg_match_all('/<funcion>((.|\n)*?)<\/funcion>/i', $srting,$m);
print_r($m[0]);

This one outputs

Array
(
    [0] => 
   esta es la descripcion de la funcion 6

    [1] => 
   esta es la descripcion de la funcion 7

)

DEMO

. . If the structure is exactly like that (always indented inside content) you can easily match it with /\n[\s]+([^\n]+(\n[\s]+)*)\n/.

. . I always tend to avoid "lazy" ("non greedy") modifiers. It just kind of look as a hack, and it's not available everywhere and with the same implementation. Since in this case you don't seem to need it, I would suggest you not to use it.

. . Try this:

$regexp = '/<funcion>\n[\s]+([^\n]+(\n[\s]+)*)\n</funcion>/';
$works = preg_match_all($regexp, $file, $matches);
echo '<pre>';
print_r($matches);

. . The "$matches[1]" array will get you an array of the "funcion" tags contents.

. . Of course it would be nice to pre-filter the content and apply the RegExp on the comment contents only to avoid any mismatch.

. . Have fun.

来源：https://stackoverflow.com/questions/15150175/non-greedy-regex

标签

php

regex

regex-greedy

non-greedy