Convert column to matrix format using awk

前端未结

关注

 4  2110

I have a gridded data file in column format as:

ifile.txt
x     y     value
20.5  20.5  -4.1
21.5  20.5  -6.2
22.5  20.5   0.0
20.5  21.5   1.2
21.5  21.5   4.3


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2021-02-15 17:04
              
            
            
                                                                       
The following awk script handles :


any size of matrix
no relation between row and column indices so it keeps track of them separately.
If a certain row column index does not appear, the value will default to zero.


This is done in this way:

awk '
BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"}
(NR==1){next}
{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}
END { printf "%8s",""; for (j in col) { printf "%8.3f",j }; printf "\n"
      for (i in row) {
        printf "%8.3f",i; for (j in col) { printf "%8.3f",val[i" "j] }; printf "\n"
      }
    }' <file>


How does it work:


PROCINFO["sorted_in"] = "@ind_num_asc", states that all arrays are sorted numerically by index.
(NR==1){next} : skip the first line
{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}, process the line by storing the row and column index and accompanying value.
The end statement does all the printing.


This outputs:

          20.500  21.500  22.500
  20.500  -4.100   1.200   7.000
  21.500  -6.200   4.300  10.400
  22.500   0.000   6.000  16.700


note: the usage of PROCINFO is a gawk feature.

However, if you make a couple of assumptions, you can do it much shorter:


the file contains all possible entries, no missing values
you do not want the indices of the rows and columns printed out:
the indices are sorted in column-major-order


The you can use the following short versions:

sort -g <file> | awk '($1+0!=$1){next}
                      ($1!=o)&&(NR!=1){printf "\n"}
                      {printf "%8.3f",$3; o=$1 }'


which outputs

  -4.100   1.200   7.000
  -6.200   4.300  10.400
   0.000   6.000  16.700


or for the transposed:

awk '(NR==1){next}
     ($2!=o)&&(NR!=2){printf "\n"}
     {printf "%8.3f",$3; o=$2 }' <file>


This outputs

  -4.100  -6.200   0.000
   1.200   4.300   6.000
   7.000  10.400  16.700

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2021-02-15 17:06
              
            
            
                                                                       
awk solution:

sort -n ifile.txt | awk 'BEGIN{header="\t"}NR>1{if((NR-1)%3==1){header=header sprintf("%4.1f\t",$1); matrix=matrix sprintf("%4.1f\t",$1)}matrix= matrix sprintf("%4.1f\t",$3); if((NR-1)%3==0 && NR!=10)matrix=matrix "\n"}END{print header; print matrix}';
        20.5    21.5    22.5
20.5    -4.1     1.2     7.0
21.5    -6.2     4.3    10.4
22.5     0.0     6.0    16.7


Explanations: 


sort -n ifile.txt sort the file numerically 
header variable will store all the data necessary to create the header line it is initiated to header="\t" and will be appended with the necessary information thanks to header=header sprintf("%4.1f\t",$1) for lines respecting (NR-1)%3==1) 
in the same way you construct the matrix using matrix variable: matrix=matrix sprintf("%4.1f\t",$1) will create the first column and 
matrix= matrix sprintf("%4.1f\t",$3) will populate the matrix with the content then if((NR-1)%3==0 &&
NR!=10)matrix=matrix "\n" will add the adequate EOL

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2021-02-15 17:11
              
            
            
                                                                       
Perl solution:

#!/usr/bin/perl -an
$h{ $F[0] }{ $F[1] } = $F[2] unless 1 == $.;
END {
    @s = sort { $a <=> $b } keys %h;
    print ' ' x 5;
    printf '%5.1f' x @s, @s;
    print "\n";
    for my $u (@s) {
        print "$u ";
        printf '%5.1f', $h{$u}{$_} for @s;
        print "\n";
    }
}



-n reads the input line by line
-a splits each line on whitespace into the @F array
See sort, print, printf, and keys.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  没有蜡笔的小新        
                
              
                            
                2021-02-15 17:19
              
            
            
                                                                       
Adjusted my old GNU awk solution for your current input data:

matrixize.awk script:

#!/bin/awk -f
BEGIN { PROCINFO["sorted_in"]="@ind_num_asc"; OFS="\t" }
NR==1{ next }
{
    b[$1];               # accumulating unique indices
    ($1 != $2)? a[$1][$2] = $3 : a[$2][$1] = $3; # set `diagonal` relation between different indices 
}
END {
    h = "";
    for (i in b) {
        h = h OFS i     # form header columns
    } 
    print h;            # print header column values
    for (i in b) { 
        row = i;        # index column
        # iterating through the row values (for each intersection point)
        for (j in a[i]) {
            row = row OFS a[i][j]
        } 
        print row  
    }
}




Usage:

awk -f matrixize.awk yourfile


The output:

    20.5    21.5    22.5
20.5  -4.1  1.2   7.0
21.5  -6.2  4.3   10.4
22.5  0.0   6.0   16.7

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复