An efficient way to transpose a file in Bash

前端 未结 29 2126
时光说笑
时光说笑 2020-11-22 03:30

I have a huge tab-separated file formatted like this

X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11

I would like t

相关标签:
29条回答
  • Not very elegant, but this "single-line" command solves the problem quickly:

    cols=4; for((i=1;i<=$cols;i++)); do \
                awk '{print $'$i'}' input | tr '\n' ' '; echo; \
            done
    

    Here cols is the number of columns, where you can replace 4 by head -n 1 input | wc -w.

    0 讨论(0)
  • 2020-11-22 03:47

    I used fgm's solution (thanks fgm!), but needed to eliminate the tab characters at the end of each row, so modified the script thus:

    #!/bin/bash 
    declare -a array=( )                      # we build a 1-D-array
    
    read -a line < "$1"                       # read the headline
    
    COLS=${#line[@]}                          # save number of columns
    
    index=0
    while read -a line; do
        for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
            array[$index]=${line[$COUNTER]}
            ((index++))
        done
    done < "$1"
    
    for (( ROW = 0; ROW < COLS; ROW++ )); do
      for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
        printf "%s" ${array[$COUNTER]}
        if [ $COUNTER -lt $(( ${#array[@]} - $COLS )) ]
        then
            printf "\t"
        fi
      done
      printf "\n" 
    done
    
    0 讨论(0)
  • 2020-11-22 03:48

    Simple 4 line answer, keep it readable.

    col="$(head -1 file.txt | wc -w)"
    for i in $(seq 1 $col); do
        awk '{ print $'$i' }' file.txt | paste -s -d "\t"
    done
    
    0 讨论(0)
  • 2020-11-22 03:49

    Another option is to use rs:

    rs -c' ' -C' ' -T
    

    -c changes the input column separator, -C changes the output column separator, and -T transposes rows and columns. Do not use -t instead of -T, because it uses an automatically calculated number of rows and columns that is not usually correct. rs, which is named after the reshape function in APL, comes with BSDs and OS X, but it should be available from package managers on other platforms.

    A second option is to use Ruby:

    ruby -e'puts readlines.map(&:split).transpose.map{|x|x*" "}'
    

    A third option is to use jq:

    jq -R .|jq -sr 'map(./" ")|transpose|map(join(" "))[]'
    

    jq -R . prints each input line as a JSON string literal, -s (--slurp) creates an array for the input lines after parsing each line as JSON, and -r (--raw-output) outputs the contents of strings instead of JSON string literals. The / operator is overloaded to split strings.

    0 讨论(0)
  • 2020-11-22 03:49

    The only improvement I can see to your own example is using awk which will reduce the number of processes that are run and the amount of data that is piped between them:

    /bin/rm output 2> /dev/null
    
    cols=`head -n 1 input | wc -w` 
    for (( i=1; i <= $cols; i++))
    do
      awk '{printf ("%s%s", tab, $'$i'); tab="\t"} END {print ""}' input
    done >> output
    
    0 讨论(0)
  • 2020-11-22 03:49

    I've used below two scripts to do similar operations before. The first is in awk which is a lot faster than the second which is in "pure" bash. You might be able to adapt it to your own application.

    awk '
    {
        for (i = 1; i <= NF; i++) {
            s[i] = s[i]?s[i] FS $i:$i
        }
    }
    END {
        for (i in s) {
            print s[i]
        }
    }' file.txt
    
    declare -a arr
    
    while IFS= read -r line
    do
        i=0
        for word in $line
        do
            [[ ${arr[$i]} ]] && arr[$i]="${arr[$i]} $word" || arr[$i]=$word
            ((i++))
        done
    done < file.txt
    
    for ((i=0; i < ${#arr[@]}; i++))
    do
        echo ${arr[i]}
    done
    
    0 讨论(0)
提交回复
热议问题