Howto split a string on a multi-character delimiter in bash?

萝らか妹 提交于 2019-11-26 21:55:29

问题


Why doesn't work the following bash code?

for i in $( echo "emmbbmmaaddsb" | split -t "mm"  )
do
    echo "$i"
done

expected output:

e
bb
aaddsb

回答1:


Since you're expecting newlines, you can simply replace all instances of mm in your string with a newline. In pure native bash:

in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"

If you wanted to do such a replacement on a longer input stream, you might be better off using awk, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literal shell function (backending into awk) given in BashFAQ #21 is applicable:

# Taken from http://mywiki.wooledge.org/BashFAQ/021

# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
  # STR cannot be empty
  [[ $1 ]] || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

...used, in this context, as:

gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt



回答2:


A more general example, without replacing the multi-character delimiter with a single character delimiter is given below :

Using parameter expansions : (from the comment of @gniourf_gniourf)

#!/bin/bash

str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
    array+=( "${s%%"$delimiter"*}" );
    s=${s#*"$delimiter"};
done;
declare -p array

A more crude kind of way

#!/bin/bash

# main string
str="LearnABCtoABCSplitABCaABCString"

# delimiter string
delimiter="ABC"

#length of main string
strLen=${#str}
#length of delimiter string
dLen=${#delimiter}

#iterator for length of string
i=0
#length tracker for ongoing substring
wordLen=0
#starting position for ongoing substring
strP=0

array=()
while [ $i -lt $strLen ]; do
    if [ $delimiter == ${str:$i:$dLen} ]; then
        array+=(${str:strP:$wordLen})
        strP=$(( i + dLen ))
        wordLen=0
        i=$(( i + dLen ))
    fi
    i=$(( i + 1 ))
    wordLen=$(( wordLen + 1 ))
done
array+=(${str:strP:$wordLen})

declare -p array

Reference - Bash Tutorial - Bash Split String




回答3:


The recommended tool for character subtitution is sed's command s/regexp/replacement/ for one regexp occurence or global s/regexp/replacement/g, you do not even need a loop or variables.

Pipe your echo output and try to substitute the characters mm witht the newline character \n:

echo "emmbbmmaaddsb" | sed 's/mm/\n/g'

The output is:

e
bb
aaddsb



回答4:


With awk you can use the gsub to replace all regex matches.

As in your question, to replace all substrings of two or more 'm' chars with a new line, run:

echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, "\n" ); print; }'

e

bb

aaddsb

The ‘g’ in gsub() stands for “global,” which means replace everywhere.

You may also ask to print just N match, for example:

echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, " " ); print $2; }'

bb



来源:https://stackoverflow.com/questions/40686922/howto-split-a-string-on-a-multi-character-delimiter-in-bash

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!