How to replace all the blanks within square brackets with an underscore using sed?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-06 05:46:57

There are two parts to the trickery needed:

  1. Stop replacing when you reach a close square bracket (but do it repeatedly on the line):

    s/\(\[[^] ]*\) /\1_/g
    

    This matches an open square bracket, followed by zero or more characters that are neither a blank nor a close square bracket. The global suffix means that the pattern is applied to all sequences starting with an open square bracket followed eventually by a blank or close square bracket on the line. Note, too, that this regex does not alter '[single-word] and context' whereas the original would translate that to '[single-word]_and context', which is not the object of the exercise.

  2. Get sed to repeat the search from where this one started. Unfortunately, there isn't a truly good way to do that. Sed always resumes searching after the text that was substituted; and this is one occasion when we don't want that. Sometimes, you can get away with simply repeating the substitute operation. In this case, you have to repeat it every time the substitution succeeds, stopping when there are no more substitutions.

Two of the less well known operations in sed are the ':label' and the 't' commands. They were present in the 7th Edition of Unix (circa 1978), though, so they are not new features. The first simply identifies a position in the script which can be jumped to with 'b' (not wanted here) or 't':

[2addr]t [label]

Branch to the ':' function bearing the label if any substitutions have been made since the most recent reading of an input line or execution of a 't' function. If no label is specified, branch to the end of the script.

Marvellous: we need:

 sed -e ':redo; s/\(\[[^] ]*\) /\1_/g; t redo' data.file

Except - it doesn't work all on one line like that (at least, not on MacOS X). This did work admirably, though:

sed -e ':redo
        s/\(\[[^] ]*\) /\1_/g
        t redo' data.file

Or, as noted in the comments, you could write three separate '-e' options (which works on MacOS X):

 sed -e ':redo' -e 's/\(\[[^] ]*\) /\1_/g' -e 't redo' data.file

Given the data file:

a line with [one blank] word inside square brackets.
a line with [two blank] or [three blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple words in a single bracket] inside square brackets.
a line with [multiple words in a single bracket] [several times on one line]

the output from the sed script shown is:

a line with [one_blank] word inside square brackets.
a line with [two_blank] or [three_blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple_words_in_a_single_bracket] inside square brackets.
a line with [multiple_words_in_a_single_bracket] [several_times_on_one_line]

And, finally, reading the fine print in the question, if you need this done only in the first square-bracketed field on each line, then we need to ensure that are no open square brackets before the one that starts the match. This variant works:

sed -e ':redo' -e 's/^\([^]]*\[[^] ]*\) /\1_/' -e 't redo' data.file

(The 'g' qualifier is gone - it probably isn't needed in the other variants either given the loop; its presence might make the process marginally more efficient, but it would most likely be essentially impossible to detect that. The pattern is now anchored to the start of the line (the caret) and contains zero or more characters that are not open square bracket before the first open square bracket.)

Sample output:

a line with [two_blank] or [three blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple_words_in_a_single_bracket] inside square brackets.
a line with [multiple_words_in_a_single_bracket] [several times on one line]

This is easier in a language like perl which has "executable" substitutions:

perl -wne 's/(\[.*?])/ do { my $x = $1; $x =~ y, ,_,; $x } /ge; print'

Or to split it up more clearly:

sub replace_with_underscores {
    my $s = shift;
    $s =~ y/ /_/;
    $s
}
s/(\[.*?])/ replace_with_underscores($1) /ge;

The .*? is the non-greedy match (to avoid slurring together two adjacent bracketed phrases) and the e flag to the substitution causes it to be evaluated, so you can call a function to do the inner work.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!