I\'m trying to return a count of all words NOT between square brackets. So given ..
[don\'t match these words] but do match these
I get a c
Ok, I think this should work:
\[[^\]]+\](?:^|\s)([\w']+)(?!\])\b|(?:^|\s)([\w']+)(?!\])\b
You can test it here:
http://regexpal.com/
If you need an alternative with text in square brackets coming after the main text, it could be added as a second alternative and the current second one would become third.
It's a bit complicated but I can't think of a better solution right now.
If you need to do something with the actual matches you will find them in the capturing groups.
UPDATE:
Explanation: So, we've got two options here:
\[[^\]]+\](?:^|\s)([\w']+)(?!\])\b
This is saying:
\[[^\]]+\]
- match everything in square brackets (don't capture)(?:^|\s)
- followed by line start or a space - when I look at it now take the caret out as it doesn't make sense so this will become just \s
([\w']+)
- match all following word characters as long as (?!\])
the next character is not the closing bracket - well this is probably also unnecessary now, so let's try and remove the lookahead\b
- and match word boundary2 (?:^|\s)([\w']+)(?!\])\b
If you cannot find the option 1 - do just the word matching, without looking for square brackets as we ensured with the first part that they are not here.
Ok, so I removed all the things that we don't need (they stayed there because I tried quite a few options before it worked:-) and the revised regex is the one below:
\[[^\]]+\]\s([\w']+)(?!\])\b|(?:^|\s)([\w']+)\b