I want to replace all pairs of square brackets in a file, e.g., [some text]
, with \\macro{some text}
, e.g.:
This is some [text].
Th
It took a little doing, but here:
sed -i.bkup 's/\[\([^]]*\)\]/\\macro{\1}/g' test.txt
Let's see if I can explain this regular expression:
\[
is matching a square bracket. Since [
is a valid magic regular expression character, the backslash means to match the literal character.\(...\)
is a capture group. It captures the part of the regular expression I want. I can have many capture groups, and in sed
I can reference them as \1
, \2
, etc.\(...\)
. I have [^]]*
.
[^...]
syntax means any character but.[^]]
means any character but a closing brace.*
means zero or more of the preceding. That means I am capturing zero or more characters that are not closing square braces.\]
means the closing square bracketLet's look at the line this is [some] more [text]
s
in some as many characters as possible that are not closing square brackets. This means I am matching [some
, but only capturing some
.[some
and now I'm matching on the last closing square bracket. That means I'm matching [some]
. Note that regular expressions are normally greedy. I'll explain below why this is important.\\macro(\1)
. The \1
is replaced by my capture group. The \\
is just a backslash. Thus, I'll replace [some]
with \macro{some}
.It would be much easier if I could be guaranteed a single set of square brackets in each line. Then I could have done this:
sed -i.bkup 's/\[\(.*\)\]/\\macro(\1)/g'
The capture group is now saying anything between to square brackets. However, the problem is that regular expressions are greedy, that means I would have matched from the s
in some
all the way to the final t
in text. The 'x' below show the capture group. The [
and ]
show the square brackets I'm matching on:
this is [some] more [text]
[xxxxxxxxxxxxxxxx]
This became more complex because I had to match on characters that had special meaning to regular expressions, so we see a lot of backslashing. Plus, I had to account for regular expression greediness, which got the nice looking, non-matching string [^]]*
to match anything not a closing bracket. Add in the square brackets before and after \[[^]]*\]
, and don't forget the \(...\)
capture group: \[\([^]]*\)\]
And you get one big mess of a regular expression.
sed -e 's/\[\([^]]*\)\]/\\macro{\1}/g' file.txt
This looks for an opening bracket, any number of explicitly non-closing brackets, then a closing bracket. The group is captured by the parens and inserted into the replacement expression.
use groups
sed 's|\[\([^]]*\)\]|\\macro{\1}|g' file
The following expression matches the pattern [a-z, A-Z and space]
and replaces it with \macro{<whatever was between the []>}
sed -e 's/\[\([a-zA-Z ]*\)\]/\\macro{\1}/g'
In the expression the \( ... \)
form a match group that can be referenced later in the substitution as \1