Why a surrogate java regexp finds hyphen-minus -

前端 未结 2 2006
春和景丽
春和景丽 2021-02-04 15:08

I am trying to find why this regex in JAVA ([\\ud800-\\udbff\\udc00-\\udfff]) used in replaceAll(regexp,\"\") is removing also the hyphen-minus charac

2条回答
  •  醉话见心
    2021-02-04 15:40

    If you make the range

    [\ud800-\udfff]
    

    or

    [\ud800-\udbff\udbff-\udfff]
    

    it will leave the hyphen untouched. Seems like a bug to me.

    Note there is no reason for the double range, in your example \udc00 is just the next code point after \udbff so you could skip that. If you make the two ranges overlap one or more code points, it works again, but you could just as well leave it out (see my first example above).

提交回复
热议问题