Regex capture every occurrence of a word within two delimiters

前端未结

关注

 4  1542

Say I have a long string of text, and I want to capture every time the word this is mentioned within rounded brackets. How could I do that? The following patt

相关标签:

4条回答

感情败类

2021-01-05 16:25
the use of .* is going to match every single character in your search string. So what you're actually doing here is greedily matching everything before and after the first occurrence of this found within parentheses. Your current match results probably look a little bit like the following:
```
["(odio this nibh euismod nulla, eget auctor orci nibh vel this nisi. Aliquam this erat volutpat)", "this"]
```
Where the first item in the array is the entire substring matched by the expression, and everything that follows are your regex's captured values.

If you want to match every occurrence of this inside the parentheses, one solution would be to first get a substring of everything inside the parentheses, then search for this in that substring:
```
# Match everything inside the parentheses
/\([^\)]*\)/

# Match all occurrences of the word 'this' inside a substring
/this/g
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2021-01-05 16:28

(this)

the string above works for me, try this on http://regex101.com

0 讨论(0)
发布评论:

提交评论
- 加载中...
故里飘歌

2021-01-05 16:38

First off, don't be greedy.

/\(.*?(this).*?\)/g

Secondly, if you're aiming to count the number of occurrences of 'this', a regex is probably not the right tool here. The problem is that you need to match the closing delimiter to determine that the first 'this' is enclosed, which means that continuing to apply the regex will not match anything inside that already-consumed set of delimiters.

The regex I have above will catch things like:

foo (baz this bar) (foo this)

But not (it will only match twice, once for each set of delimiters):

foo (this this bar) baz (this this this)

Try using a simple single-pass scanner instead of a regex. Another alternative is to use two regular expressions, one to separate the string into enclosed and non-enclosed sections, and another to search within the enclosed regions.

0 讨论(0)
发布评论:

提交评论
- 加载中...

小鲜肉

2021-01-05 16:38

I implemented the regex to enclose all alphanumberic characters using regex below:

# cat testfile 
aabc a1 +++    xyz 20   30 =40  -r
# cat testfile | sed -e "s/\([[:alnum:]]\{1,\}\)/<pre>\1<post>/g"
<pre>aabc<post> <pre>a1<post> +++    <pre>xyz<post> <pre>20<post>   <pre>30<post> =<pre>40<post>  -<pre>r<post>
#

Hope it helps.

0 讨论(0)