I am a regex supernoob (just reading my first articles about them), and at the same time working towards stronger use of vim. I would like to use a regex to search for all inst
An interesting feature of Vim regex is the presence of \zs
and \ze
. Other engines might have them too, but they're not very common.
The purpose of \zs
is to mark the start of the match, and \ze
the end of it. For example:
ab\zsc
matches c
, only if before you have ab
. Similarly:
a\zebc
matches a
only if you have bc
after it. You can mix both:
a\zsb\zec
matches b
only if in between a
and c
. You can also create zero-width matches, which are ideal for what you're trying to do:
:%s/:\zs\ze\S/ /
Your search has no size, only a position. And them you substitute that position by " ". By the way, \S
means any character but white space ones.
:\zs\ze\S
matches the position between a colon and something not a space.
:%s/:\(\S\)/: \1/g
\S
matches any character that is not whitespace, but you need to remember what that non-whitespace character is. This is what the \(\)
does. You can then refer to it using \1
in the replacement.
So you match a :
, some non-whitespace character and then replace it with a :
, a space, and the captured character.
Changing this to only modify the text when there's only one :
is fairly straight forward. As others have suggested, using some of the zero-width assertions will be useful.
:%s/:\@!<:[^:[:space:]]\@=/: /g
:\@!<
matches any non-:
, including the start of the line. This is an important characteristic of the negative lookahead/lookbehind assertions. It's not requiring that there actually be a character, just that there isn't a :
.
:
matches the required colon.
[^:[:space:]]
introduces a couple more regex concepts.
The outer []
is a collection. A collection is used to match any of the characters listed inside. However, a leading ^
negates that match. So, [abc123]
will match a
, b
, c
, 1
, 2
, or 3
, but [^abc123]
matches anything but those characters.
[:space:]
is a character class. Character classes can only be used inside a collection. [:space:]
means, unsurprisingly, any whitespace. In most implementations, it relates directly to the result of the C library's isspace
function.
Tying that all together, the collection means "match any character that is not a :
or whitespace".
\@=
is the positive lookahead assertion. It applies to the previous atom (in this case the collection) and means that the collection is required for the pattern to be a successful match, but will not be part of the text that is replaced.
So, whenever the pattern matches, we just replace the :
with itself and a space.
you probably want to use :[^ ]
to mach everything except spaces. As mentioned by Matt this will cause your replace to replace the extra character.
There are several ways to avoid this, here are 2 that I find useful.
1) Surround the last part of the search term with parenthesis \(\)
, this allows you to reference that part of the search in your replace term with a /1
.
Your final replace string should look like this:
%s/:\([^ ]\)/: \1/g
2) end the search term early with \ze
This will means that the entire search term must be met for a match, but only the part before \ze
will be higlighted / or replaced
Your final replace string should look like this:
%s/:\ze[^ ]/: /g
You want to use a zero-width negative lookahead assertion, which is a fancy way of saying look for a character that's not a space, but don't include it in the match:
:%s/: \@!/: /g
The \@!
is the negative lookahead.