Regular Expressions, understanding lookbehind in combination with the or operator

后端 未结 1 1924
时光取名叫无心
时光取名叫无心 2021-01-15 21:13

This is more a question of understanding than an actual problem. The situation explains as follows. I got some float numbers (e.g. an amount of money) between two quotation

1条回答
  •  花落未央
    2021-01-15 21:44

    You are correct,

    (?<=\"[0-9]|\"[0-9]{2}|\"[0-9]{3})(,)(?=[0-9]{2}\")
    

    should be the right regex in this case.


    About why you "don't need the \" for two and three digits" - you actually need it.

    (?<=\"[0-9]|[0-9]{2}|[0-9]{3})(,)(?=[0-9]{2}\")
    

    Will match 12,23" and 123,23" as well.


    EDIT: Looks like the problem is that Sublime doesn't allow for variable length of lookbehind even if they are listed with |. Meaning (?<=\"[0-9]|\"[0-9]{2}|\"[0-9]{3}) will fail, because the alternatives are not of the same size - 2, 3, 4.

    This is because Sublime seems to be using the Boost library regexes. There it is stated:

    Lookbehind

    (?<=pattern) consumes zero characters, only if pattern could be matched against the characters preceding the current position (pattern must be of fixed length).

    (? consumes zero characters, only if pattern could not be matched against the characters preceding the current position (pattern must be of fixed length).

    An alternative is to separate the lookbehinds:

    (?:(?<=\"[0-9])|(?<=\"[0-9]{2})|(?<=\"[0-9]{3}))(,)(?=[0-9]{2}\")
    


    What can you do if you don't want to list all possible lengths?

    There is a cool trick which is present in some regex engines (including Perl's, Ruby's and Sublime's) - \K. What \K roughly translates to is "drop all that was matched so far". Therefore, you can match any , within a float number surrounded by quotation marks with:

    "\d+\K,(?=\d+")
    

    See it in action

    0 讨论(0)
提交回复
热议问题