An efficient way to transpose a file in Bash

前端 未结 29 2127
时光说笑
时光说笑 2020-11-22 03:30

I have a huge tab-separated file formatted like this

X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11

I would like t

相关标签:
29条回答
  • 2020-11-22 04:03

    Assuming all your rows have the same number of fields, this awk program solves the problem:

    {for (f=1;f<=NF;f++) col[f] = col[f]":"$f} END {for (f=1;f<=NF;f++) print col[f]}
    

    In words, as you loop over the rows, for every field f grow a ':'-separated string col[f] containing the elements of that field. After you are done with all the rows, print each one of those strings in a separate line. You can then substitute ':' for the separator you want (say, a space) by piping the output through tr ':' ' '.

    Example:

    $ echo "1 2 3\n4 5 6"
    1 2 3
    4 5 6
    
    $ echo "1 2 3\n4 5 6" | awk '{for (f=1;f<=NF;f++) col[f] = col[f]":"$f} END {for (f=1;f<=NF;f++) print col[f]}' | tr ':' ' '
     1 4
     2 5
     3 6
    
    0 讨论(0)
  • 2020-11-22 04:03

    There is a purpose built utility for this,

    GNU datamash utility

    apt install datamash  
    
    datamash transpose < yourfile
    

    Taken from this site, https://www.gnu.org/software/datamash/ and http://www.thelinuxrain.com/articles/transposing-rows-and-columns-3-methods

    0 讨论(0)
  • 2020-11-22 04:04

    Have a look at GNU datamash which can be used like datamash transpose. A future version will also support cross tabulation (pivot tables)

    0 讨论(0)
  • 2020-11-22 04:04

    Here's a Haskell solution. When compiled with -O2, it runs slightly faster than ghostdog's awk and slightly slower than Stephan's thinly wrapped c python on my machine for repeated "Hello world" input lines. Unfortunately GHC's support for passing command line code is non-existent as far as I can tell, so you will have to write it to a file yourself. It will truncate the rows to the length of the shortest row.

    transpose :: [[a]] -> [[a]]
    transpose = foldr (zipWith (:)) (repeat [])
    
    main :: IO ()
    main = interact $ unlines . map unwords . transpose . map words . lines
    
    0 讨论(0)
  • 2020-11-22 04:04

    A oneliner using R...

      cat file | Rscript -e "d <- read.table(file('stdin'), sep=' ', row.names=1, header=T); write.table(t(d), file=stdout(), quote=F, col.names=NA) "
    
    0 讨论(0)
提交回复
热议问题