parse search string for phrases and keywords

后端 未结 3 2045
滥情空心
滥情空心 2021-02-04 14:41

i need to parse a search string for keywords and phrases in php, for example

string 1: value of \"measured response\" detect goal \"method valuation\" study

相关标签:
3条回答
  • 2021-02-04 15:28

    There is no need to use a regular expression, the built in function str_getcsv can be used to explode a string with any given delimiter, enclosure and escape characters.

    Really it is as simple as.

    // where $string is the string to parse
    $array = str_getcsv($string, ' ', '"'); 
    
    0 讨论(0)
  • 2021-02-04 15:34
    preg_match_all('/(?<!")\b\w+\b|(?<=")\b[^"]+/', $subject, $result, PREG_PATTERN_ORDER);
    for ($i = 0; $i < count($result[0]); $i++) {
        # Matched text = $result[0][$i];
    }
    

    This should yield the results you are looking for.

    Explanation :

    # (?<!")\b\w+\b|(?<=")\b[^"]+
    # 
    # Match either the regular expression below (attempting the next alternative only if this one fails) «(?<!")\b\w+\b»
    #    Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!")»
    #       Match the character “"” literally «"»
    #    Assert position at a word boundary «\b»
    #    Match a single character that is a “word character” (letters, digits, etc.) «\w+»
    #       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
    #    Assert position at a word boundary «\b»
    # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match) «(?<=")\b[^"]+»
    #    Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=")»
    #       Match the character “"” literally «"»
    #    Assert position at a word boundary «\b»
    #    Match any character that is NOT a “"” «[^"]+»
    #       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
    
    0 讨论(0)
  • 2021-02-04 15:36
    $s = 'value of "measured response" detect goal "method valuation" study';
    preg_match_all('~(?|"([^"]+)"|(\S+))~', $s, $matches);
    print_r($matches[1]);
    

    output:

    Array
    (
        [0] => value
        [1] => of
        [2] => measured response
        [3] => detect
        [4] => goal
        [5] => method valuation
        [6] => study
    )
    

    The trick here is to use a branch-reset group: (?|...|...). It's just like an alternation contained in a non-capturing group - (?:...|...) - except that within each branch the capturing-group numbers start at the same number. (For more info, see the PCRE docs and search for DUPLICATE SUBPATTERN NUMBERS.)

    Thus, the text we're interested in is always captured group #1. You can retrieve the contents of group #1 for all matches via $matches[1]. (That's assuming the PREG_PATTERN_ORDER flag is set; I didn't specify it like @FailedDev did because it's the default. See the PHP docs for details.)

    0 讨论(0)
提交回复
热议问题