Regex101 vs Oracle Regex

流过昼夜 提交于 2021-02-10 21:24:59

问题


My regex:

^\+?(-?)0*([[:digit:]]+,[[:digit:]]+?)0*$

It is removing leading + and leading and tailing 0s in decimal number.

I have tested it in regex101

For input: +000099,8420000 and substitution \1\2 it returns 99,842

I want the same result in Oracle database 11g:

select REGEXP_REPLACE('+000099,8420000','^\+?(-?)0*([[:digit:]]+,[[:digit:]]+?)0*$','\1\2') from dual;

But it returns 99,8420000 (tailing 0s are still present...)

What I'm missing?

EDIT

It works like greedy quantifier * at the end of regex, not lazy *? but I definitely set lazy one.


回答1:


The problem is well-known for all those who worked with Henry Spencer's regex library implementations: lazy quantifiers should not be mixed up with greedy quantifiers in one and the same branch since that leads to undefined behavior. The TRE regex engine used in R shows the same behavior. While you may mix the lazy and greedy quantifiers to some extent, you must always make sure you get a consistent result.

The solution is to only use lazy quantifiers inside the capturing group:

select REGEXP_REPLACE('+000099,8420000', '^\+?(-?)0*([0-9]+?,[0-9]+?)0*$','\1\2') as Result from dual

See the online demo

The [0-9]+?,[0-9]+? part matches 1 or more digits but as few times as possible followed with a comma and then 1 or more digits, as few as possible.

Some more tests (select REGEXP_REPLACE('+00009,010020','[0-9]+,[0-9]+?([1-9])','\1') from dual yields +20) prove that the first quantifier in a group sets the quantifier greediness type. In the case above, Group 0 quantifier greediness is set to greedy by the first ? quantifier, and Group 1 (i.e. ([0-9]+?,[0-9]+?)) greediness type is set with the first +? (which is lazy).



来源:https://stackoverflow.com/questions/45302698/regex101-vs-oracle-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!