how to reduce complexity in regex?

前端 未结 3 1454
面向向阳花
面向向阳花 2021-01-25 23:29

I have a regex which finds all kind of money denoted in dollars,like $290,USD240,$234.45,234.5$,234.6usd

(\\$)[0-9]+\\.?([0-9]         


        
相关标签:
3条回答
  • 2021-01-25 23:54

    Reducing the complexity you are reducing the correctness. The following regex works correctly, but even it doesn't take lowcase. (but that could be managed by a key). All other current answers here simply haven't the correct substring for the decimal number.

    ^\s*(?:(?:(?:-?(?:usd|\$)|(?:usd|\$)-)(?:(?:0|[1-9]\d*)?(?:\.\d+)?(?<=\d)))|(?:-?(?:(?:0|[1-9]\d*)?(?:\.\d+)?(?<=\d))(?:usd|\$)))\s*$
    

    Look here at the test results.

    Make a correct line and only after that try to shorten it.

    0 讨论(0)
  • 2021-01-25 23:55

    It is possible to make the regex a bit shorter by collapsing the currency indicators:
    You can say USD OR $ amount instead of USD amount OR $ amount. This results in the following regex:

    ((\$|usd)[0-9]+\.?([0-9]*))|([0-9]+\.?[0-9]*(\$|usd))
    

    Im not sure if you'll find this less complex, but at least it's easier to read because it's shorter

    The character set [0-9] can also be replaced by \d -- the character class which matches any digit -- making the regex even shorter.
    Doing this, the regex will look as follows:

    ((\$|usd)\d+\.?\d*)|(\d+\.?\d*(\$|usd))
    

    Update:

    • According to @Toto this regex would be more performant using non-capturing groups (also removed the not-necessary capture group as pointed out by @Simon MᶜKenzie):

      (?:\$|usd)\d+\.?\d*|\d+\.?\d*(?:\$|usd)
      
    • $.0 like amounts are not matched by the regex as @Gangnus pointed out. I updated the regex to fix this:

      ((\$|usd)((\d+\.?\d*)|(\.\d+)))|(((\d+\.?\d*)|(\.\d+))(\$|usd))
      

      Note that I changed \d+\.?\d* into ((\d+\.?\d*)|(\.\d+)): It now either matches one or more digits, optionally followed by a dot, followed by zero or more digits; OR a dot followed by one or more digits.

      Without unnecessary capturing groups and using non-capturing groups:

      (?:\$|usd)(?:\d+\.?\d*|\.\d+)|(?:\d+\.?\d*|\.\d+)(?:\$|usd)
      
    0 讨论(0)
  • 2021-01-26 00:03

    Try this

    ^(?:\$|usd)?(?:\d+\.?\d*)(?:\$|usd)?$

    0 讨论(0)
提交回复
热议问题