Regex: Use start of line/end of line signs (^ or $) in different context

匿名 (未验证) 提交于 2019-12-03 02:45:02

问题:

While doing some small regex task I came upon this problem. I have a string that is a list of tags that looks e.g like this:
foo,bar,qux,garp,wobble,thud

What I needed to do was to check if a certain tag, e.g. 'garp' was in this list. (What it finally matches is not really important, just if there is a match or not.)

My first and a bit stupid try at this was to use the following regex:
[^,]garp[,$]

My idea was that before 'garp' there should either be the start of the line/string or a comma, after 'garp' there should be either a comma or the end of the line/string.

Now, it is instantly obvious that this regex is wrong: Both ^ and $ change their behaviour in the context of the character class [ ].

What I finally came up with is the following:
^garp$|^garp,|,garp,|,garp$

This regex just handles the 4 cases one by one. (Tag at beginning of list, in the center, at the end, or as the only element of the list.) The last regex is somehow a bit ugly in my eyes and just for funs sake I'd like to make it a bit more elegant.

Is there a way how the start of line/end of line characters (^ and $) can be used in the context of character classes?

EDIT: Ok, some more info was wished so here it is: I'm using this within an Oracle SQL statement. This sadly does not allow any look-around assertions but as I'm only interested if there is a match or not (and not what is matched) this does not really affect me here. The tags can contain non-alphabetical characters like - or _ so \bgarp\b would not work. Also one tag can contain an other tag as SilentGhost said, so /garp/ doesnt work either.

回答1:

You can't use ^ and $ in character classes in the way you wish - they will be interpreted literally, but you can use an alternation to achieve the same effect:

(^|,)garp(,|$) 


回答2:

you just need to use word boundary (\b) instead of ^ and $:

\bgarp\b 


回答3:

Just use look-arounds to solve this:

(?<=^|,)garp(?=$|,) 

The difference with look-arounds and just regular groups are that with regular groups the comma would be part of the match, and with look-arounds it wouldn't. In this case it doesn't make a difference though.



回答4:

I'm a big regex fan, but in this case (a comma-separated string), although both Mark Byers', SilentGhost's and reko_t's solution do work, I'd rather suggest looking at a CSV parser.

Might be overkill for the job, but then we don't know the real requirements and the real data that needs to be handled.



回答5:

This could be optimized quite a bit with SQL INSTR function (position of string), it doesn't need Regex.

Just check if/or:

  1. If the string equals garp (the only item in the list)
  2. If garp, is at position 1 (i.e. the 0-position) of the string (first item in list)
  3. If ,garp is at LENGTH(string) - LENGTH(',garp')[1] position (last item in list)
  4. If the string contains ,garp, at all (in the middle of the list)

[1] Possibly an off-by-one error here



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!