php preg_match return position of last match

前端 未结 4 462
南旧
南旧 2021-01-11 17:32

With

preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE); 

is it possible to search a string in reverse? ie. return the position

相关标签:
4条回答
  • 2021-01-11 18:17

    preg_match does not support reverse searching because it is not neccessary.

    You can create a RegExp that contains a greedy (that is default) lookahead that matches anything (like (?<=.*)stuff ). This way you should get the last occurence of your match.

    detailed information from official documentation here: preg_match

    0 讨论(0)
  • 2021-01-11 18:18

    "Greedy" is the key word here. * is by default greedy *? limits greediness to the bare minimum.

    So the solution is to use the combination, e.g. (searching for last period followed by a whitespace):

    /^.*\.\s(.*?)$/s
    
    • ^ is the beginning of text
    • .* eats as much as it can, including matching patterns
    • \.\s is the period followed by a whitespace (what I am looking for)
    • (.*?) eats as little as possible. Capture group () so I could address it as a match group.
    • $ end of text
    • s - makes sure newlines are ignored (not treated as $ and ^, . dot matches newline)
    0 讨论(0)
  • 2021-01-11 18:19

    I did not understand exactly what you want, because it depends on how many groups will be captured, I made a function to capture the offset of the last capture according to the group number, in my pattern, have 3 groups: the first group, full capture and the other two groups, sub-groups.

    Pattern sample code:

    $pattern = "/<a[^\x3e]{0,}href=\x22([^\x22]*)\x22>([^\x3c]*)<\/a>/";
    

    HTML sample code:

    $subject = '<ul>
    <li>Search Engines</li>
    <li><a href="https://www.google.com/">Google</a></li>
    <li><a href="http://www.bing.com/">Bing</a></li>
    <li><a href="https://duckduckgo.com/">DuckDuckGo</a></li>
    </ul>';
    

    My Function, it captures the offset of the last element and you have the possibility to indicate the number of matching:

    function get_offset_last_match( $pattern, $subject, $number ) {
        if ( preg_match_all( $pattern, $subject, $matches, PREG_OFFSET_CAPTURE ) == false ) {
            return false;
        }
        return $matches[$number][count( $matches[0] ) - 1][1];
    }
    

    You can get detailed information about preg_match_all here on official documentation.

    Using my pattern for example:

    0 => all text
    1 => href value
    2 => innerHTML

    echo '<pre>';
    echo get_offset_last_match( $pattern, $subject, 0 ) . PHP_EOL; // all text
    echo get_offset_last_match( $pattern, $subject, 1 ) . PHP_EOL; // href value
    echo get_offset_last_match( $pattern, $subject, 2 ) . PHP_EOL; // innerHTML
    echo '</pre>';
    die();
    

    Output is:

    140
    149
    174
    

    My function (text):

    function get_text_last_match( $pattern, $subject, $number ) {
        if ( preg_match_all( $pattern, $subject, $matches, PREG_OFFSET_CAPTURE ) == false ) {
            return false;
        }
        return $matches[$number][count( $matches[0] ) - 1][0];
    }
    

    Sample code:

    echo '<textarea style="font-family: Consolas: font-size: 14px; height: 200px; tab-size: 4; width: 90%;">';
    echo 'ALL   = ' . get_text_last_match( $pattern, $subject, 0 ) . PHP_EOL; // all text
    echo 'HREF  = ' . get_text_last_match( $pattern, $subject, 1 ) . PHP_EOL; // href value
    echo 'INNER = ' . get_text_last_match( $pattern, $subject, 2 ) . PHP_EOL; // innerHTML
    echo '</textarea>';
    

    Output is:

    ALL   = <a href="https://duckduckgo.com/">DuckDuckGo</a>
    HREF  = https://duckduckgo.com/
    INNER = DuckDuckGo
    
    0 讨论(0)
  • 2021-01-11 18:26

    PHP doesn't have a regex method that search a string from right to left (like in .net). There are several possible recipes to solve that (this list isn't exhaustive, but may provide ideas for your own workaround ):

    • using preg_match_all with PREG_SET_ORDER flag and end($matches) will give you the last match set
    • reversing the string with strrev and building a "reversed" pattern to be used with preg_match
    • using preg_match and building a pattern anchored at the end of the string that ensures there is no more occurrences of the searched mask until the end of the string
    • using a greedy quantifier before the target and \K to start the match result at the position you want. Once the end of the string is reached, the regex engine will backtrack until it finds a match.

    Examples with the string $str = 'xxABC1xxxABC2xx' for the pattern /x[A-Z]+\d/

    way 1: find all matches and displays the last.

    if ( preg_match_all('/x[A-Z]+\d/', $str, $matches, PREG_SET_ORDER) )
        print_r(end($matches)[0]);
    

    demo

    way 2: find the first match of the reversed string with a reversed pattern, and displays the reversed result.

    if ( preg_match('/\d[A-Z]+x/', strrev($str), $match) )
        print_r(strrev($match[0]));
    

    demo

    Note that it isn't always so easy to reverse a pattern.

    way 3: Jumps from x to x and checks with the negative lookahead if there's no other x[A-Z]+\d matches from the end of the string.

    if ( preg_match('/x[A-Z]+\d(?!.*x[A-Z]+\d)/', $str, $match) )
        print_r($match[0]);
    

    demo

    variants:

    With a lazy quantifier

    if ( preg_match('/x[A-Z]+\d(?!.*?x[A-Z]+\d)/', $str, $match) )
        print_r($match[0]);
    

    or with a "tempered quantifier"

    if ( preg_match('/x[A-Z]+\d(?=(?:(?!x[A-Z]+\d).)*$)/', $str, $match) )
        print_r($match[0]);
    

    It can be interesting to choose between these variants when you know in advance where a match has the most probabilities to occur.

    way 4: goes to the end of the string and backtracks until it finds a x[A-Z]+\d match. The \K removes the start of the string from the match result.

    if ( preg_match('/^.*\Kx[A-Z]+\d/', $str, $match) )
        print_r($match[0]);
    

    way 4 (a more hand-driven variant): to limit backtracking steps, you can greedily advance from the start of the string, atomic group by atomic group, and backtrack in the same way by atomic groups, instead of by characters.

    if ( preg_match('/^[^x]*+(?>x[^x]*)*\Kx[A-Z]+\d/', $str, $match) )
        print_r($match[0]);
    
    0 讨论(0)
提交回复
热议问题