not returning the whole pattern in regex in python

后端 未结 2 1804
伪装坚强ぢ
伪装坚强ぢ 2021-01-22 06:11

I have the following code:

haystack = \"aaa months(3) bbb\"
needle = re.compile(r\'(months|days)\\([\\d]*\\)\')
instances = list(set(needle.findall(haystack)))
p         


        
相关标签:
2条回答
  • 2021-01-22 06:28
    needle = re.compile(r'((?:months|days)\([\d]*\))')
    

    fixes your problem.

    you were capturing only the months|days part.

    in this specific situation, this regex is a bit better:

    needle = re.compile(r'((?:months|days)\(\d+\))')
    

    this way you will only get results with a number, previously a result like months() would work. if you want to ignore case for options like Months or Days, then also add the re.IGNORECASE flag. like this:

    re.compile(r'((?:months|days)\(\d+\))', re.IGNORECASE)
    

    some explanation for the OP:

    a regular expression is comprised of many elements, the chief among them is the capturing group. "()" but sometimes we want to make groups without capturing, so we use "(?:)" there are many other forms of groups, but these are the most common.

    in this case, we surround the entire regular expression in a capturing group, because you are trying to capture everything, normally - any regular expression is automatically surrounded by a capturing group, but in this case, you specified one explicitly, so it did not surround your regular expression with an automatic capture group.

    now that we have surrounded the entire regular expression with a capturing group, we turn the group we have into a non-capturing group by adding ?: to the beginning, as shown above. we could also not have surrounded the entire regular expression and only turned the group into a non-capturing group, since as you saw, it will automatically turn the whole regular expression into a capturing group where non is present. i personally prefer explicit coding.

    further information about regular expressions can be found here: http://docs.python.org/library/re.html

    0 讨论(0)
  • 2021-01-22 06:49

    Parens are not just for grouping, but also for forming capture groups. What you want is re.compile(r'(?:months|days)\(\d+\)'). That uses a non-capturing group for the or condition, and will not get you a bunch of subgroup matches you don't appear to want when using findall.

    0 讨论(0)
提交回复
热议问题