Regex capture every occurrence of a word within two delimiters

前端 未结 4 1542
自闭症患者
自闭症患者 2021-01-05 15:47

Say I have a long string of text, and I want to capture every time the word this is mentioned within rounded brackets. How could I do that? The following patt

相关标签:
4条回答
  • 2021-01-05 16:25

    the use of .* is going to match every single character in your search string. So what you're actually doing here is greedily matching everything before and after the first occurrence of this found within parentheses. Your current match results probably look a little bit like the following:

    ["(odio this nibh euismod nulla, eget auctor orci nibh vel this nisi. Aliquam this erat volutpat)", "this"]
    

    Where the first item in the array is the entire substring matched by the expression, and everything that follows are your regex's captured values.

    If you want to match every occurrence of this inside the parentheses, one solution would be to first get a substring of everything inside the parentheses, then search for this in that substring:

    # Match everything inside the parentheses
    /\([^\)]*\)/
    
    # Match all occurrences of the word 'this' inside a substring
    /this/g
    
    0 讨论(0)
  • 2021-01-05 16:28

    (this)

    the string above works for me, try this on http://regex101.com

    0 讨论(0)
  • 2021-01-05 16:38

    First off, don't be greedy.

    /\(.*?(this).*?\)/g

    Secondly, if you're aiming to count the number of occurrences of 'this', a regex is probably not the right tool here. The problem is that you need to match the closing delimiter to determine that the first 'this' is enclosed, which means that continuing to apply the regex will not match anything inside that already-consumed set of delimiters.

    The regex I have above will catch things like:

    foo (baz this bar) (foo this)

    But not (it will only match twice, once for each set of delimiters):

    foo (this this bar) baz (this this this)

    Try using a simple single-pass scanner instead of a regex. Another alternative is to use two regular expressions, one to separate the string into enclosed and non-enclosed sections, and another to search within the enclosed regions.

    0 讨论(0)
  • 2021-01-05 16:38

    I implemented the regex to enclose all alphanumberic characters using regex below:

    # cat testfile 
    aabc a1 +++    xyz 20   30 =40  -r
    # cat testfile | sed -e "s/\([[:alnum:]]\{1,\}\)/<pre>\1<post>/g"
    <pre>aabc<post> <pre>a1<post> +++    <pre>xyz<post> <pre>20<post>   <pre>30<post> =<pre>40<post>  -<pre>r<post>
    #
    

    Hope it helps.

    0 讨论(0)
提交回复
热议问题