How to replace paired square brackets with other syntax with sed?

前端 未结 4 1598
无人及你
无人及你 2020-12-06 01:57

I want to replace all pairs of square brackets in a file, e.g., [some text], with \\macro{some text}, e.g.:

This is some [text].
Th         


        
相关标签:
4条回答
  • 2020-12-06 02:03

    It took a little doing, but here:

    sed -i.bkup  's/\[\([^]]*\)\]/\\macro{\1}/g' test.txt
    

    Let's see if I can explain this regular expression:

    1. The \[ is matching a square bracket. Since [ is a valid magic regular expression character, the backslash means to match the literal character.
    2. The \(...\) is a capture group. It captures the part of the regular expression I want. I can have many capture groups, and in sed I can reference them as \1, \2, etc.
    3. Inside the capture group \(...\). I have [^]]*.
      1. The [^...] syntax means any character but.
      2. The [^]] means any character but a closing brace.
      3. The * means zero or more of the preceding. That means I am capturing zero or more characters that are not closing square braces.
    4. The \] means the closing square bracket

    Let's look at the line this is [some] more [text]

    • In #1 above, I capture the first open square bracket in front of the word some. However, it's not in a capture group. This is the first character I'm going to substitute.
    • I now start a capture group. I am capturing according to 3.2 and 3.3 above, starting with the letter s in some as many characters as possible that are not closing square brackets. This means I am matching [some, but only capturing some.
    • In #4, I have ended my capture group. I've matched for substitution purposes [some and now I'm matching on the last closing square bracket. That means I'm matching [some]. Note that regular expressions are normally greedy. I'll explain below why this is important.
    • Now, I can match the replacement string. This is much easier. It's \\macro(\1). The \1 is replaced by my capture group. The \\ is just a backslash. Thus, I'll replace [some] with \macro{some}.

    It would be much easier if I could be guaranteed a single set of square brackets in each line. Then I could have done this:

    sed -i.bkup 's/\[\(.*\)\]/\\macro(\1)/g'
    

    The capture group is now saying anything between to square brackets. However, the problem is that regular expressions are greedy, that means I would have matched from the s in some all the way to the final t in text. The 'x' below show the capture group. The [ and ] show the square brackets I'm matching on:

     this is [some] more [text]
             [xxxxxxxxxxxxxxxx]
    

    This became more complex because I had to match on characters that had special meaning to regular expressions, so we see a lot of backslashing. Plus, I had to account for regular expression greediness, which got the nice looking, non-matching string [^]]* to match anything not a closing bracket. Add in the square brackets before and after \[[^]]*\], and don't forget the \(...\) capture group: \[\([^]]*\)\]And you get one big mess of a regular expression.

    0 讨论(0)
  • 2020-12-06 02:08
    sed -e 's/\[\([^]]*\)\]/\\macro{\1}/g' file.txt
    

    This looks for an opening bracket, any number of explicitly non-closing brackets, then a closing bracket. The group is captured by the parens and inserted into the replacement expression.

    0 讨论(0)
  • 2020-12-06 02:10

    use groups

    sed 's|\[\([^]]*\)\]|\\macro{\1}|g' file
    
    0 讨论(0)
  • 2020-12-06 02:14

    The following expression matches the pattern [a-z, A-Z and space] and replaces it with \macro{<whatever was between the []>}

    sed -e 's/\[\([a-zA-Z ]*\)\]/\\macro{\1}/g'
    

    In the expression the \( ... \) form a match group that can be referenced later in the substitution as \1

    0 讨论(0)
提交回复
热议问题