Regular Expressions, understanding lookbehind in combination with the or operator

后端 未结 1 1925
时光取名叫无心
时光取名叫无心 2021-01-15 21:13

This is more a question of understanding than an actual problem. The situation explains as follows. I got some float numbers (e.g. an amount of money) between two quotation

相关标签:
1条回答
  • 2021-01-15 21:44

    You are correct,

    (?<=\"[0-9]|\"[0-9]{2}|\"[0-9]{3})(,)(?=[0-9]{2}\")
    

    should be the right regex in this case.


    About why you "don't need the \" for two and three digits" - you actually need it.

    (?<=\"[0-9]|[0-9]{2}|[0-9]{3})(,)(?=[0-9]{2}\")
    

    Will match 12,23" and 123,23" as well.


    EDIT: Looks like the problem is that Sublime doesn't allow for variable length of lookbehind even if they are listed with |. Meaning (?<=\"[0-9]|\"[0-9]{2}|\"[0-9]{3}) will fail, because the alternatives are not of the same size - 2, 3, 4.

    This is because Sublime seems to be using the Boost library regexes. There it is stated:

    Lookbehind

    (?<=pattern) consumes zero characters, only if pattern could be matched against the characters preceding the current position (pattern must be of fixed length).

    (?<!pattern) consumes zero characters, only if pattern could not be matched against the characters preceding the current position (pattern must be of fixed length).

    An alternative is to separate the lookbehinds:

    (?:(?<=\"[0-9])|(?<=\"[0-9]{2})|(?<=\"[0-9]{3}))(,)(?=[0-9]{2}\")
    


    What can you do if you don't want to list all possible lengths?

    There is a cool trick which is present in some regex engines (including Perl's, Ruby's and Sublime's) - \K. What \K roughly translates to is "drop all that was matched so far". Therefore, you can match any , within a float number surrounded by quotation marks with:

    "\d+\K,(?=\d+")
    

    See it in action

    0 讨论(0)
提交回复
热议问题