sed rare-delimiter (other than & | / ?…)

问题

I've to apply the Unix command sed on a string (can contain #, !, /, ?, &, @ and all other characters) which can contains all types of character (&, |, !, /, ? ...)

Is it a complex delimiter (with two caracters ?) which can permits to outpass the error :

sed: -e expression #1, char 22: unknown option to `s'

Thanks in advance

回答1:

There is no such option for multi-character expression delimiters in sed, but I doubt you need that. The delimiter character should not occur in the pattern, but if it appears in the string being processed, it's not a problem. And unless you're doing something extremely weird, there will always be some character that doesn't appear in your search pattern that can serve as a delimiter.

回答2:

The characters in the input file are of no concern - sed parses them fine. There may be an issue, however, if you have most of the common characters in your pattern - or if your pattern may not be known beforehand.

At least on GNU sed, you can use a non-printable character that is highly improbable to exist in your pattern as a delimiter. For example, if your shell is Bash:

$ echo '|||' | sed s$'\001''|'$'\001''/'$'\001''g'

In this example, Bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).

Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.

回答3:

Another way to do this is to use Shell Parameter Substitution.

${parameter/pattern/replace}  # substitute replace for pattern once

${parameter//pattern/replace}  # substitute replace for pattern everywhere

Here is a quite complex example that is difficult with sed:

$ parameter="Common sed delimiters: [sed-del]"
$ pattern="\[sed-del\]"
$ replace="[/_%:\\@]"
$ echo "${parameter//$pattern/replace}"

result is:

Common sed delimiters: [/_%:\@]

However: This only work with bash parameters and not files where sed excel.

回答4:

You need the nested delimiter facility that Perl offers. That allows to use stuff like matching, substituting, and transliterating without worrying about the delimiter being included in your contents. Since perl is a superset of sed, you should be able to use it for whatever you’re used sed for.

Consider this:

$ perl -nle 'print if /something/' inputs

Now if your something contains a slash, you have a problem. The way to fix this is to change delimiter, preferably to a bracketing one. So for example, you could having anything you like in the $WHATEVER shell variable (provided the backets are balanced), which gets interpolated by the shell before Perl is even called here:

 $ perl -nle "print if m($WHATEVER)" /usr/share/dict/words

That works even if you have correctly nested parens in $WHATEVER. The four bracketing pairs which correctly nest like this in Perl are < >, ( ), [ ], and { }. They allow arbitrary contents that include the delimiter if that delimiter is balanced.

If it is not balanced, then do not use a delimiter at all. If the pattern is in a Perl variable, you don’t need to use the match operator provided you use the =~ operator, so:

$whatever = "some arbitrary string ( / # [ etc";
if ($line =~ $whatever) { ... }

回答5:

With the help of Jim Lewis, I finally did a test before using sed :

if [ `echo $1 | grep '|'` ]; then
    grep ".*$1.*:" $DB_FILE  | sed "s@^.*$1*.*\(:\)@@ "
else
    grep ".*$1.*:" $DB_FILE  | sed "s|^.*$1*.*\(:\)|| "
fi

Thanks for help

回答6:

Escaping the delimiter inline for BASH to parse is cumbersome and difficult to read (although the delimiter does need escaping for sed's benefit when it's first used, per-expression).

To pull together thkala's answer and user4401178's comment:

DELIM=$(echo -en "\001");
sed -n "\\${DELIM}${STARTING_SEARCH_TERM}${DELIM},\\${DELIM}${ENDING_SEARCH_TERM}${DELIM}p" "${FILE}"

This example returns all results starting from ${STARTING_SEARCH_TERM} until ${ENDING_SEARCH_TERM} that don't match the SOH (start of heading) character with ASCII code 001.

回答7:

Wow. I totally did not know that you could use any character as a delimiter. At least half the time I use the sed and BREs its on paths, code snippets, junk characters, things like that. I end up with a bunch of horribly unreadable escapes which I'm not even sure won't die on some combination I didn't think of. But if you can exclude just some character class (or just one character even)

echo '#01Y $#1+!' | sed -e 'sa$#1+ashita' -e 'su#01YuHolyug'

> > > Holy shit! That's so much easier.

回答8:

There's no universal separator, but it can be escaped by a backslash for sed to not treat it like separator (at least unless you choose a backslash character as separator).

Depending on the actual application, it might be handy to just escape those characters in both pattern and replacement.

If you're in a bash environment, you can use bash substitution to escape sed separator, like this:

safe_replace () {
    sed "s/${1//\//\\\/}/${2//\//\\\/}/g"
}

It's pretty self-explanatory, except for the bizarre part. Explanation to that:

${1//\//\\\/}
${            - bash expansion starts
  1           - first positional argument - the pattern
   //         - bash pattern substitution pattern separator "replace-all" variant
     \/       - literal slash
       /      - bash pattern substitution replacement separator
        \\    - literal backslash
          \/  - literal slash
            } - bash expansion ends

example use:

$ input="ka/pus/ta"
$ pattern="/pus/"
$ replacement="/re/"
$ safe_replace "$pattern" "$replacement" <<< "$input"
ka/re/ta

来源：https://stackoverflow.com/questions/4844854/sed-rare-delimiter-other-than

标签

sed

delimiter