Regular expression for simple math expressions

后端 未结 4 1419
不思量自难忘°
不思量自难忘° 2021-01-21 23:40

As an exercise I was trying to come up with a regex to evaluate simple algebra like

q = \'23 * 345 - 123+65\'

From here I want to get \'23\', \

相关标签:
4条回答
  • 2021-01-22 00:03

    Simply try this.

    import re
    q = '23 * 345 - 123+65'
    regexparse = r'(\d+)|[-+*/]'
    for i in re.finditer(regexparse, q):
        print i.group(0)
    

    output:

    23
    *
    345
    -
    123
    +
    65
    
    0 讨论(0)
  • 2021-01-22 00:08

    This is your regex:

    (\d+\s*(\*|\/|\+|\-)\s*)+(\d+\s*)
    

    (\d+\s*(\*|\/|\+|\-)\s*) will match the first part of your expression: 23 * and store * in the second group.

    Then the + makes it repeat, but because repeating capture groups retain only their last match, it will discard 23 * and * and instead match 345 - and - in the second group.

    The + works again on the next repeat to discard the last capture and instead capture 123+ in the first group and + in the second.

    Next, + cannot repeat any more, so it stops, and (\d+\s*) starts matching to get 65.


    The fact that repeating capture groups store only the last capture is how regex works by design and is like this in all regex engines AFAIK.


    Further elaboration:

    There's a difference between matching repeatedly and capturing repeatedly. Try: (\d)+ on 12345 and you will see that only 5 will be captured. It's like that because you the paren is assigned a particular group capture. The first group is assigned group 1 and if you have many captures for group 1, you can only keep 1 and that's the last. This is how regex works, unfortunately, as per the docs:

    If a group matches multiple times, only the last match is accessible


    If you want to get your desired output, you might use re.findall and match with \d+|[+/*-]:

    import re
    q = '23 * 345 - 123+65'
    regexparse = r'\d+|[+/*-]'
    elem = re.findall(regexparse, q)
    print(elem)
    #=> ['23', '*', '345', '-', '123', '+', '65']
    
    0 讨论(0)
  • 2021-01-22 00:12

    Your regex is confusing. Better to use re.split() for this purpose:

    q = '23 * 345 - 123+65'
    print re.split('\s*([-+/*])\s*', q)
    

    Outputs:

    ['23', '*', '345', '-', '123', '+', '65']
    
    0 讨论(0)
  • 2021-01-22 00:19

    I can only speak of regex in general, as I don't know python, but your problem is that in

    (\d+\s*[\*/+-]\s*)+(\d+\s*)
    

    This portion

    (\d+\s*[\*/+-]\s*)+
    

    Is being repeated and when it's completely done evaluating, you only see the final one.

    0 讨论(0)
提交回复
热议问题