regex php separate an exact word from string in diffrent groups

后端 未结 1 1838
闹比i
闹比i 2021-01-18 23:17

i have tried all i know but still can\'t figure out how to resolve this problem :

i have a string ex :

  • \"--included-- in selling price: 5 % vat
相关标签:
1条回答
  • 2021-01-18 23:55

    Here is the solution:

    $s = "--included-- in product price: breakfast --excluded--: 5 % vat aed 10.00 destination fee per night 2 % municipality fee 3.5 % packaging fee 10 % warranty service charge";
    $results = [];
    if (preg_match_all('~(--(?:(?:not )?in|ex)cluded--)(?:\s+([a-zA-Z ]+))?:+\s*((?:(?!--(?:(?:not )?in|ex)cluded--).)*)~su', $s, $m, PREG_SET_ORDER, 0)) {
        foreach ($m as $v) {
            $lastline=array_pop($v); // Remove last item //print_r($details);
            if (preg_match_all('~(?:(\b(?:usd|aed|mad|usd)\b|\B€|\bus\$)\s*)?\d+(?:\.\d+)?(?:(?!(?1))\D)*~ui', $lastline, $details)) {
                $results[] = array_merge($v, $details[0]);
            } else {
                $results[] = $v;
            }
        }
    }
    print_r($results);
    

    See the PHP demo.

    Notes:

    The first regex extracts each match you need to parse. See the first regex demo. It means:

    • (--(?:(?:not )?in|ex)cluded--) - Group 1: a shorter version of (--excluded--|--included--|--not included--): --excluded--, --included-- or --not included--
    • (?:\s+([a-zA-Z ]+))? - an optional sequence: 1+ whitespaces and then Group 2: 1+ ASCII letters or spaces
    • :+ - 1 or more colons
    • \s* - 0+ whitespaces
    • ((?:(?!--(?:(?:not )?in|ex)cluded--).)*) - Group 3: any char, 0+ occurrences, as many as possible, not starting any of the three char sequences: --excluded--, --included--, --not included--

    Then, the Group 3 value needs to be further parsed to grab all the details. The second regex is used here to match

    • (?:(\b(?:usd|aed|mad|usd)\b|\B€|\bus\$)\s*)? - an optional occurrence of
      • (\b(?:usd|aed|mad|usd)\b|\B€|\bus\$) - Group 1:
        • \b(?:usd|aed|mad|usd)\b - usd, aed, mad or usd as whole words
        • \B€ - not preceded with a word char
        • \bus\$ - us$ not preceded with a word char
      • \s* - 0+ whitespaces
    • \d+
    • (?:\.\d+)? - an optional sequence of . and 1+ digits
    • (?:(?!(?1))\D)* - any non-digit char, 0 or more occurrences, as many as possible, not starting the same pattern as in Group 1
    0 讨论(0)
提交回复
热议问题