I have a huge tab-separated file formatted like this
X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11
I would like t
Not very elegant, but this "single-line" command solves the problem quickly:
cols=4; for((i=1;i<=$cols;i++)); do \
awk '{print $'$i'}' input | tr '\n' ' '; echo; \
done
Here cols is the number of columns, where you can replace 4 by head -n 1 input | wc -w
.
I used fgm's solution (thanks fgm!), but needed to eliminate the tab characters at the end of each row, so modified the script thus:
#!/bin/bash
declare -a array=( ) # we build a 1-D-array
read -a line < "$1" # read the headline
COLS=${#line[@]} # save number of columns
index=0
while read -a line; do
for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
array[$index]=${line[$COUNTER]}
((index++))
done
done < "$1"
for (( ROW = 0; ROW < COLS; ROW++ )); do
for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
printf "%s" ${array[$COUNTER]}
if [ $COUNTER -lt $(( ${#array[@]} - $COLS )) ]
then
printf "\t"
fi
done
printf "\n"
done
Simple 4 line answer, keep it readable.
col="$(head -1 file.txt | wc -w)"
for i in $(seq 1 $col); do
awk '{ print $'$i' }' file.txt | paste -s -d "\t"
done
Another option is to use rs
:
rs -c' ' -C' ' -T
-c
changes the input column separator, -C
changes the output column separator, and -T
transposes rows and columns. Do not use -t
instead of -T
, because it uses an automatically calculated number of rows and columns that is not usually correct. rs
, which is named after the reshape function in APL, comes with BSDs and OS X, but it should be available from package managers on other platforms.
A second option is to use Ruby:
ruby -e'puts readlines.map(&:split).transpose.map{|x|x*" "}'
A third option is to use jq
:
jq -R .|jq -sr 'map(./" ")|transpose|map(join(" "))[]'
jq -R .
prints each input line as a JSON string literal, -s
(--slurp
) creates an array for the input lines after parsing each line as JSON, and -r
(--raw-output
) outputs the contents of strings instead of JSON string literals. The /
operator is overloaded to split strings.
The only improvement I can see to your own example is using awk which will reduce the number of processes that are run and the amount of data that is piped between them:
/bin/rm output 2> /dev/null
cols=`head -n 1 input | wc -w`
for (( i=1; i <= $cols; i++))
do
awk '{printf ("%s%s", tab, $'$i'); tab="\t"} END {print ""}' input
done >> output
I've used below two scripts to do similar operations before. The first is in awk which is a lot faster than the second which is in "pure" bash. You might be able to adapt it to your own application.
awk '
{
for (i = 1; i <= NF; i++) {
s[i] = s[i]?s[i] FS $i:$i
}
}
END {
for (i in s) {
print s[i]
}
}' file.txt
declare -a arr
while IFS= read -r line
do
i=0
for word in $line
do
[[ ${arr[$i]} ]] && arr[$i]="${arr[$i]} $word" || arr[$i]=$word
((i++))
done
done < file.txt
for ((i=0; i < ${#arr[@]}; i++))
do
echo ${arr[i]}
done