Convert column to matrix format using awk

前端 未结 4 2099
有刺的猬
有刺的猬 2021-02-15 16:36

I have a gridded data file in column format as:

ifile.txt
x     y     value
20.5  20.5  -4.1
21.5  20.5  -6.2
22.5  20.5   0.0
20.5  21.5   1.2
21.5  21.5   4.3
         


        
相关标签:
4条回答
  • 2021-02-15 17:04

    The following awk script handles :

    • any size of matrix
    • no relation between row and column indices so it keeps track of them separately.
    • If a certain row column index does not appear, the value will default to zero.

    This is done in this way:

    awk '
    BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"}
    (NR==1){next}
    {row[$1]=1;col[$2]=1;val[$1" "$2]=$3}
    END { printf "%8s",""; for (j in col) { printf "%8.3f",j }; printf "\n"
          for (i in row) {
            printf "%8.3f",i; for (j in col) { printf "%8.3f",val[i" "j] }; printf "\n"
          }
        }' <file>
    

    How does it work:

    • PROCINFO["sorted_in"] = "@ind_num_asc", states that all arrays are sorted numerically by index.
    • (NR==1){next} : skip the first line
    • {row[$1]=1;col[$2]=1;val[$1" "$2]=$3}, process the line by storing the row and column index and accompanying value.
    • The end statement does all the printing.

    This outputs:

              20.500  21.500  22.500
      20.500  -4.100   1.200   7.000
      21.500  -6.200   4.300  10.400
      22.500   0.000   6.000  16.700
    

    note: the usage of PROCINFO is a gawk feature.

    However, if you make a couple of assumptions, you can do it much shorter:

    • the file contains all possible entries, no missing values
    • you do not want the indices of the rows and columns printed out:
    • the indices are sorted in column-major-order

    The you can use the following short versions:

    sort -g <file> | awk '($1+0!=$1){next}
                          ($1!=o)&&(NR!=1){printf "\n"}
                          {printf "%8.3f",$3; o=$1 }'
    

    which outputs

      -4.100   1.200   7.000
      -6.200   4.300  10.400
       0.000   6.000  16.700
    

    or for the transposed:

    awk '(NR==1){next}
         ($2!=o)&&(NR!=2){printf "\n"}
         {printf "%8.3f",$3; o=$2 }' <file>
    

    This outputs

      -4.100  -6.200   0.000
       1.200   4.300   6.000
       7.000  10.400  16.700
    
    0 讨论(0)
  • 2021-02-15 17:06

    awk solution:

    sort -n ifile.txt | awk 'BEGIN{header="\t"}NR>1{if((NR-1)%3==1){header=header sprintf("%4.1f\t",$1); matrix=matrix sprintf("%4.1f\t",$1)}matrix= matrix sprintf("%4.1f\t",$3); if((NR-1)%3==0 && NR!=10)matrix=matrix "\n"}END{print header; print matrix}';
            20.5    21.5    22.5
    20.5    -4.1     1.2     7.0
    21.5    -6.2     4.3    10.4
    22.5     0.0     6.0    16.7
    

    Explanations:

    • sort -n ifile.txt sort the file numerically
    • header variable will store all the data necessary to create the header line it is initiated to header="\t" and will be appended with the necessary information thanks to header=header sprintf("%4.1f\t",$1) for lines respecting (NR-1)%3==1)
    • in the same way you construct the matrix using matrix variable: matrix=matrix sprintf("%4.1f\t",$1) will create the first column and matrix= matrix sprintf("%4.1f\t",$3) will populate the matrix with the content then if((NR-1)%3==0 && NR!=10)matrix=matrix "\n" will add the adequate EOL
    0 讨论(0)
  • 2021-02-15 17:11

    Perl solution:

    #!/usr/bin/perl -an
    $h{ $F[0] }{ $F[1] } = $F[2] unless 1 == $.;
    END {
        @s = sort { $a <=> $b } keys %h;
        print ' ' x 5;
        printf '%5.1f' x @s, @s;
        print "\n";
        for my $u (@s) {
            print "$u ";
            printf '%5.1f', $h{$u}{$_} for @s;
            print "\n";
        }
    }
    
    • -n reads the input line by line
    • -a splits each line on whitespace into the @F array
    • See sort, print, printf, and keys.
    0 讨论(0)
  • 2021-02-15 17:19

    Adjusted my old GNU awk solution for your current input data:

    matrixize.awk script:

    #!/bin/awk -f
    BEGIN { PROCINFO["sorted_in"]="@ind_num_asc"; OFS="\t" }
    NR==1{ next }
    {
        b[$1];               # accumulating unique indices
        ($1 != $2)? a[$1][$2] = $3 : a[$2][$1] = $3; # set `diagonal` relation between different indices 
    }
    END {
        h = "";
        for (i in b) {
            h = h OFS i     # form header columns
        } 
        print h;            # print header column values
        for (i in b) { 
            row = i;        # index column
            # iterating through the row values (for each intersection point)
            for (j in a[i]) {
                row = row OFS a[i][j]
            } 
            print row  
        }
    }
    

    Usage:

    awk -f matrixize.awk yourfile
    

    The output:

        20.5    21.5    22.5
    20.5  -4.1  1.2   7.0
    21.5  -6.2  4.3   10.4
    22.5  0.0   6.0   16.7
    
    0 讨论(0)
提交回复
热议问题