How do I split a string on a delimiter in Bash?

匿名 (未验证) 提交于 2019-12-03 02:11:02

问题:

I have this string stored in a variable:

IN="bla@some.com;john@home.com" 

Now I would like to split the strings by ; delimiter so that I have:

ADDR1="bla@some.com" ADDR2="john@home.com" 

I don't necessarily need the ADDR1 and ADDR2 variables. If they are elements of an array that's even better.


After suggestions from the answers below, I ended up with the following which is what I was after:

#!/usr/bin/env bash  IN="bla@some.com;john@home.com"  mails=$(echo $IN | tr ";" "\n")  for addr in $mails do     echo "> [$addr]" done 

Output:

> [bla@some.com] > [john@home.com] 

There was a solution involving setting Internal_field_separator (IFS) to ;. I am not sure what happened with that answer, how do you reset IFS back to default?

RE: IFS solution, I tried this and it works, I keep the old IFS and then restore it:

IN="bla@some.com;john@home.com"  OIFS=$IFS IFS=';' mails2=$IN for x in $mails2 do     echo "> [$x]" done  IFS=$OIFS 

BTW, when I tried

mails2=($IN) 

I only got the first string when printing it in loop, without brackets around $IN it works.

回答1:

You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFS only takes place to that single command's environment (to read ). It then parses the input according to the IFS variable value into an array, which we can then iterate over.

IFS=';' read -ra ADDR 

It will parse one line of items separated by ;, pushing it into an array. Stuff for processing whole of $IN, each time one line of input separated by ;:

 while IFS=';' read -ra ADDR; do       for i in "${ADDR[@]}"; do           # process "$i"       done  done 


回答2:

Taken from Bash shell script split array:

IN="bla@some.com;john@home.com" arrIN=(${IN//;/ }) 

Explanation:

This construction replaces all occurrences of ';' (the initial // means global replace) in the string IN with ' ' (a single space), then interprets the space-delimited string as an array (that's what the surrounding parentheses do).

The syntax used inside of the curly braces to replace each ';' character with a ' ' character is called Parameter Expansion.

There are some common gotchas:

  1. If the original string has spaces, you will need to use IFS:
    • IFS=':'; arrIN=($IN); unset IFS;
  2. If the original string has spaces and the delimiter is a new line, you can set IFS with:
    • IFS=$'\n'; arrIN=($IN); unset IFS;


回答3:

If you don't mind processing them immediately, I like to do this:

for i in $(echo $IN | tr ";" "\n") do   # process done 

You could use this kind of loop to initialize an array, but there's probably an easier way to do it. Hope this helps, though.



回答4:

Compatible answer

To this SO question, there is already a lot of different way to do this in . But bash has many special features, so called bashism that work well, but that won't work in any other .

In particular, arrays, associative array, and pattern substitution are pure bashisms and may not work under other shells.

On my Debian GNU/Linux, there is a standard shell called , but I know many people who like to use .

Finally, in very small situation, there is a special tool called with his own shell interpreter ().

Requested string

The string sample in SO question is:

IN="bla@some.com;john@home.com" 

As this could be useful with whitespaces and as whitespaces could modify the result of the routine, I prefer to use this sample string:

 IN="bla@some.com;john@home.com;Full Name " 

Split string based on delimiter in (version >=4.2)

Under pure bash, we may use arrays and IFS:

var="bla@some.com;john@home.com;Full Name " 

oIFS="$IFS" IFS=";" declare -a fields=($var) IFS="$oIFS" unset oIFS 

IFS=\; read -a fields 

Using this syntax under recent bash don't change $IFS for current session, but only for the current command:

set | grep ^IFS= IFS=$' \t\n' 

Now the string var is split and stored into an array (named fields):

set | grep ^fields=\\\|^var= fields=([0]="bla@some.com" [1]="john@home.com" [2]="Full Name ") var='bla@some.com;john@home.com;Full Name ' 

We could request for variable content with declare -p:

declare -p var fields declare -- var="bla@some.com;john@home.com;Full Name " declare -a fields=([0]="bla@some.com" [1]="john@home.com" [2]="Full Name ") 

read is the quickiest way to do the split, because there is no forks and no external resources called.

From there, you could use the syntax you already know for processing each field:

for x in "${fields[@]}";do     echo "> [$x]"     done > [bla@some.com] > [john@home.com] > [Full Name ] 

or drop each field after processing (I like this shifting approach):

while [ "$fields" ] ;do     echo "> [$fields]"     fields=("${fields[@]:1}")     done > [bla@some.com] > [john@home.com] > [Full Name ] 

or even for simple printout (shorter syntax):

printf "> [%s]\n" "${fields[@]}" > [bla@some.com] > [john@home.com] > [Full Name ] 

Split string based on delimiter in

But if you would write something usable under many shells, you have to not use bashisms.

There is a syntax, used in many shells, for splitting a string across first or last occurrence of a substring:

${var#*SubStr}  # will drop begin of string up to first occur of `SubStr` ${var##*SubStr} # will drop begin of string up to last occur of `SubStr` ${var%SubStr*}  # will drop part of string from last occur of `SubStr` to the end ${var%%SubStr*} # will drop part of string from first occur of `SubStr` to the end 

(The missing of this is the main reason of my answer publication ;)

As pointed out by Score_Under:

# and % delete the shortest possible matching string, and

## and %% delete the longest possible.

This little sample script work well under , , , and was tested under Mac-OS's bash too:

var="bla@some.com;john@home.com;Full Name " while [ "$var" ] ;do     iter=${var%%;*}     echo "> [$iter]"     [ "$var" = "$iter" ] && \         var='' || \         var="${var#*;}"   done > [bla@some.com] > [john@home.com] > [Full Name ] 

Have fun!



回答5:

How about this approach:

IN="bla@some.com;john@home.com"  set -- "$IN"  IFS=";"; declare -a Array=($*)  echo "${Array[@]}"  echo "${Array[0]}"  echo "${Array[1]}"  

Source



回答6:

I've seen a couple of answers referencing the cut command, but they've all been deleted. It's a little odd that nobody has elaborated on that, because I think it's one of the more useful commands for doing this type of thing, especially for parsing delimited log files.

In the case of splitting this specific example into a bash script array, tr is probably more efficient, but cut can be used, and is more effective if you want to pull specific fields from the middle.

Example:

$ echo "bla@some.com;john@home.com" | cut -d ";" -f 1 bla@some.com $ echo "bla@some.com;john@home.com" | cut -d ";" -f 2 john@home.com 

You can obviously put that into a loop, and iterate the -f parameter to pull each field independently.

This gets more useful when you have a delimited log file with rows like this:

2015-04-27|12345|some action|an attribute|meta data 

cut is very handy to be able to cat this file and select a particular field for further processing.



回答7:

echo "bla@some.com;john@home.com" | sed -e 's/;/\n/g' bla@some.com john@home.com 


回答8:

This also works:

IN="bla@some.com;john@home.com" echo ADD1=`echo $IN | cut -d \; -f 1` echo ADD2=`echo $IN | cut -d \; -f 2` 

Be careful, this solution is not always correct. In case you pass "bla@some.com" only, it will assign it to both ADD1 and ADD2.



回答9:

This worked for me:

string="1;2" echo $string | cut -d';' -f1 # output is 1 echo $string | cut -d';' -f2 # output is 2 


回答10:

I think AWK is the best and efficient command to resolve your problem. AWK is included in Bash by default in almost every Linux distribution.

echo "bla@some.com;john@home.com" | awk -F';' '{print $1,$2}' 

will give

bla@some.com john@home.com 

Of course your can store each email address by redefining the awk print field.



回答11:

A different take on Darron's answer, this is how I do it:

IN="bla@some.com;john@home.com" read ADDR1 ADDR2 


回答12:

In Bash, a bullet proof way, that will work even if your variable contains newlines:

IFS=';' read -d '' -ra array 

Look:

$ in=$'one;two three;*;there is\na newline\nin this field' $ IFS=';' read -d '' -ra array 

The trick for this to work is to use the -d option of read (delimiter) with an empty delimiter, so that read is forced to read everything it's fed. And we feed read with exactly the content of the variable in, with no trailing newline thanks to printf. Note that's we're also putting the delimiter in printf to ensure that the string passed to read has a trailing delimiter. Without it, read would trim potential trailing empty fields:

$ in='one;two;three;'    # there's an empty field $ IFS=';' read -d '' -ra array 

the trailing empty field is preserved.


Update for Bash≥4.4

Since Bash 4.4, the builtin mapfile (aka readarray) supports the -d option to specify a delimiter. Hence another canonical way is:

mapfile -d ';' -t array 


回答13:

How about this one liner, if you're not using arrays:

IFS=';' read ADDR1 ADDR2 


回答14:

Here is a clean 3-liner:

in="foo@bar;bizz@buzz;fizz@buzz;buzz@woof" IFS=';' list=($in) for item in "${list[@]}"; do echo $item; done 

where IFS delimit words based on the separator and () is used to create an array. Then [@] is used to return each item as a separate word.

If you've any code after that, you also need to restore $IFS, e.g. unset IFS.



回答15:

Without setting the IFS

If you just have one colon you can do that:

a="foo:bar" b=${a%:*} c=${a##*:} 

you will get:

b = foo c = bar 


回答16:

There is a simple and smart way like this:

echo "add:sfff" | xargs -d: -i  echo {} 

But you must use gnu xargs, BSD xargs cant support -d delim. If you use apple mac like me. You can install gnu xargs :

brew install findutils 

then

echo "add:sfff" | gxargs -d: -i  echo {} 


回答17:

This is the simplest way to do it.

spo='one;two;three' OIFS=$IFS IFS=';' spo_array=($spo) IFS=$OIFS echo ${spo_array[*]} 


回答18:

IN="bla@some.com;john@home.com" IFS=';' read -a IN_arr 

Output

bla@some.com john@home.com 

System : Ubuntu 12.04.1



回答19:

The following Bash/zsh function splits its first argument on the delimiter given by the second argument:

split() {     local string="$1"     local delimiter="$2"     if [ -n "$string" ]; then         local part         while read -d "$delimiter" part; do             echo $part         done 

For instance, the command

$ split 'a;b;c' ';' 

yields

a b c 

This output may, for instance, be piped to other commands. Example:

$ split 'a;b;c' ';' | cat -n 1   a 2   b 3   c 

Compared to the other solutions given, this one has the following advantages:

  • IFS is not overriden: Due to dynamic scoping of even local variables, overriding IFS over a loop causes the new value to leak into function calls performed from within the loop.

  • Arrays are not used: Reading a string into an array using read requires the flag -a in Bash and -A in zsh.

If desired, the function may be put into a script as follows:

#!/usr/bin/env bash  split() {     # ... }  split "$@" 


回答20:

If no space, Why not this?

IN="bla@some.com;john@home.com" arr=(`echo $IN | tr ';' ' '`)  echo ${arr[0]} echo ${arr[1]} 


回答21:

There are some cool answers here (errator esp.), but for something analogous to split in other languages -- which is what I took the original question to mean -- I settled on this:

IN="bla@some.com;john@home.com" declare -a a="(${IN/;/ })"; 

Now ${a[0]}, ${a[1]}, etc, are as you would expect. Use ${#a[*]} for number of terms. Or to iterate, of course:

for i in ${a[*]}; do echo $i; done 

IMPORTANT NOTE:

This works in cases where there are no spaces to worry about, which solved my problem, but may not solve yours. Go with the $IFS solution(s) in that case.



回答22:

Use the set built-in to load up the $@ array:

IN="bla@some.com;john@home.com" IFS=';'; set $IN; IFS=$' \t\n' 

Then, let the party begin:

echo $# for a; do echo $a; done ADDR1=$1 ADDR2=$2 


回答23:

Two bourne-ish alternatives where neither require bash arrays:

Case 1: Keep it nice and simple: Use a NewLine as the Record-Separator... eg.

IN="bla@some.com john@home.com"  while read i; do   # process "$i" ... eg.     echo "[email:$i]" done 

Note: in this first case no sub-process is forked to assist with list manipulation.

Idea: Maybe it is worth using NL extensively internally, and only converting to a different RS when generating the final result externally.

Case 2: Using a ";" as a record separator... eg.

NL=" " IRS=";" ORS=";"  conv_IRS() {   exec tr "$1" "$NL" }  conv_ORS() {   exec tr "$NL" "$1" }  IN="bla@some.com;john@home.com" IN="$(conv_IRS ";" 

In both cases a sub-list can be composed within the loop is persistent after the loop has completed. This is useful when manipulating lists in memory, instead storing lists in files. {p.s. keep calm and carry on B-) }



回答24:

Apart from the fantastic answers that were already provided, if it is just a matter of printing out the data you may consider using awk:

awk -F";" '{for (i=1;i [%s]\n", $i)}' 

This sets the field separator to ;, so that it can loop through the fields with a for loop and print accordingly.

Test

$ IN="bla@some.com;john@home.com" $ awk -F";" '{for (i=1;i [%s]\n", $i)}'  [bla@some.com] > [john@home.com] 

With another input:

$ awk -F";" '{for (i=1;i [%s]\n", $i)}'  [a] > [b] > [c   d] > [e_] > [f] 


回答25:

In Android shell, most of the proposed methods just do not work:

$ IFS=':' read -ra ADDR 

What does work is:

$ for i in ${PATH//:/ }; do echo $i; done /sbin /vendor/bin /system/sbin /system/bin /system/xbin 

where // means global replacement.



回答26:

A one-liner to split a string separated by ';' into an array is:

IN="bla@some.com;john@home.com" ADDRS=( $(IFS=";" echo "$IN") ) echo ${ADDRS[0]} echo ${ADDRS[1]} 

This only sets IFS in a subshell, so you don't have to worry about saving and restoring its value.



回答27:

IN='bla@some.com;john@home.com;Charlie Brown 

Output:

bla@some.com john@home.com Charlie Brown 

Explanation: Simple assignment using parenthesis () converts semicolon separated list into an array provided you have correct IFS while doing that. Standard FOR loop handles individual items in that array as usual. Notice that the list given for IN variable must be "hard" quoted, that is, with single ticks.

IFS must be saved and restored since Bash does not treat an assignment the same way as a command. An alternate workaround is to wrap the assignment inside a function and call that function with a modified IFS. In that case separate saving/restoring of IFS is not needed. Thanks for "Bize" for pointing that out.



回答28:

Maybe not the most elegant solution, but works with * and spaces:

IN="bla@so me.com;*;john@home.com" for i in `delims=${IN//[^;]}; seq 1 $((${#delims} + 1))` do    echo "> [`echo $IN | cut -d';' -f$i`]" done 

Outputs

> [bla@so me.com] > [*] > [john@home.com] 

Other example (delimiters at beginning and end):

IN=";bla@so me.com;*;john@home.com;" > [] > [bla@so me.com] > [*] > [john@home.com] > [] 

Basically it removes every character other than ; making delims eg. ;;;. Then it does for loop from 1 to number-of-delimiters as counted by ${#delims}. The final step is to safely get the $ith part using cut.



回答29:

Okay guys!

Here's my answer!

DELIMITER_VAL='='  read -d '' F_ABOUT_DISTRO_R 

Why this approach is "the best" for me?

Because of two reasons:

  1. You do not need to escape the delimiter;
  2. You will not have problem with blank spaces. The value will be properly separated in the array!

[]'s



回答30:

you can apply awk to many situations

echo "bla@some.com;john@home.com"|awk -F';' '{printf "%s\n%s\n", $1, $2}' 

also you can use this

echo "bla@some.com;john@home.com"|awk -F';' '{print $1,$2}' OFS="\n" 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!