regex in parenthesis at the beginning

醉酒当歌 提交于 2019-12-11 02:14:00

问题


I have a regex trying to divide questions by speciality. Say I have the following regex:

(?P<speciality>[0-9x]+)

It works fine for this question (correct match: 7)

(7)Which of the following is LEAST to be considered as a risk factor for esophageal cancer?;

And for this (correct match: 8 and 13)

(8,13)30 year old woman with amenorrhea, low serum estrogen and high serum LH/FSH, the most likely diagnosis is:

But not for this one (incorrect match: 20).

First trimester spontaneous abortion (before 20 wk) is most commonly due to:

I only need the numbers in parentheses at the beginning of the question, all other parentheses should be ignored. Is this possible with a regex alone (lookahead?).


回答1:


If your regex flavor supports \G continuous matching and \K reset beginning of match, try:

(?:^\(|\G,)\K[\dx]+

^\( would match parenthesis at start | OR \G match , after last match. Then \K resets and match + one or more of [\dx]. (\d is a shorthand for [0-9]). Matches will be in $0.

Test at regex101.com; Regex FAQ


PHP example

$str = "(1x,2,3x) abc (1,2x,3) d";

preg_match_all('~(?:^\(|\G,)\K[\dx]+~', $str, $out);

print_r($out[0]);

Array
(
    [0] => 1x
    [1] => 2
    [2] => 3x
)

Test at eval.in




回答2:


Perhaps something like this will work (you don't mention the regex flavor that you're using, though I am guessing it is PCRE by the use of the named group - and yes, it does use positive lookahead):

^\((?P<speciality>(?:[0-9x]+,?)+)(?=\))/mg

The caret ^ combined with the multiline modifier \m (which causes the anchors ^ and $ to match the beginning and end of lines, respectively, instead of the beginning and end of the string) will ensure that what is matched is at the start of the paragraph. The specialties will be captured in the specialty named capture group; the only caveat is that if more than one specialty is given (as in your example starting (8,13)) the capture will be a comma-delimited list, just as the specialty is a comma-delimited list (to use the same example, the capture will be 8,13 in that case).

Please see Regex Demo here.




回答3:


(?P<speciality>[0-9x]+) matches any nonempty sequence of digits anywhere in the input. the parentheses just delimit the capturing group but are not part of the match.

to match a number (or more separated by commas) between parentheses at the beginning of the line you could use something like this

^\((\d+)(,(\d+))*\)

EDIT

it seems repeated capturing groups, as in (,(\d+))*, will only return the last match. so to get the values it'd be necessary to catch the complete list of numbers and parse it afterwards:

^\((?P<specialities>(\d+)(,(\d+))*)\)

will catch one or more numbers separated by commas, between parentheses.

added the start of line anchor so it is at the beginning of the line.

Demo



来源:https://stackoverflow.com/questions/28005315/regex-in-parenthesis-at-the-beginning

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!