Javascript Regex for all words not between certain characters

后端 未结 3 1658
借酒劲吻你
借酒劲吻你 2021-01-17 04:44

I\'m trying to return a count of all words NOT between square brackets. So given ..

[don\'t match these words] but do match these

I get a c

相关标签:
3条回答
  • 2021-01-17 05:16

    I would use something like \[[^\]]*\] to remove the words between square brackets, and then explode by spaces the returned string to count the remaining words.

    0 讨论(0)
  • 2021-01-17 05:17

    Chris, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a general question about how to exclude patterns in regex.)

    Here's our simple regex (see it at work on regex101, looking at the Group captures in the bottom right panel):

    \[[^\]]*\]|(\b\w+\b)
    

    The left side of the alternation matches complete [bracketed groups]. We will ignore these matches. The right side matches and captures words to Group 1, and we know they are the right words because they were not matched by the expression on the left.

    This program shows how to use the regex (see the count result in the online demo):

    <script>
    var subject = '[match ye not these words] but do match these';
    var regex = /\[[^\]]*\]|(\b\w+\b)/g;
    var group1Caps = [];
    var match = regex.exec(subject);
    
    // put Group 1 captures in an array
    while (match != null) {
        if( match[1] != null ) group1Caps.push(match[1]);
        match = regex.exec(subject);
    }
    
    
    document.write("<br>*** Number of Matches ***<br>");
    document.write(group1Caps.length);
    
    </script>
    

    Reference

    How to match (or replace) a pattern except in situations s1, s2, s3...

    0 讨论(0)
  • 2021-01-17 05:35

    Ok, I think this should work:

    \[[^\]]+\](?:^|\s)([\w']+)(?!\])\b|(?:^|\s)([\w']+)(?!\])\b
    

    You can test it here:
    http://regexpal.com/

    If you need an alternative with text in square brackets coming after the main text, it could be added as a second alternative and the current second one would become third.
    It's a bit complicated but I can't think of a better solution right now.

    If you need to do something with the actual matches you will find them in the capturing groups.

    UPDATE:

    Explanation: So, we've got two options here:

    1. \[[^\]]+\](?:^|\s)([\w']+)(?!\])\b

    This is saying:

    • \[[^\]]+\] - match everything in square brackets (don't capture)
    • (?:^|\s) - followed by line start or a space - when I look at it now take the caret out as it doesn't make sense so this will become just \s
    • ([\w']+) - match all following word characters as long as (?!\])the next character is not the closing bracket - well this is probably also unnecessary now, so let's try and remove the lookahead
    • \b - and match word boundary

    2 (?:^|\s)([\w']+)(?!\])\b

    If you cannot find the option 1 - do just the word matching, without looking for square brackets as we ensured with the first part that they are not here.

    Ok, so I removed all the things that we don't need (they stayed there because I tried quite a few options before it worked:-) and the revised regex is the one below:

    \[[^\]]+\]\s([\w']+)(?!\])\b|(?:^|\s)([\w']+)\b
    
    0 讨论(0)
提交回复
热议问题