How to Add Column with Percentage

前端未结

关注

 4  1658

I would like to calculate percentage of value in each line out of all lines and add it as another column. Input (delimiter is \\t):


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  深忆病人        
                
              
                            
                2021-02-08 10:27
              
            
            
                                                                       
You can do it in a couple of passes

#!/bin/bash

total=$(awk '{total=total+$2}END{print total}' file)
awk -v total=$total '{ printf ("%s\t%s\t%.2f\n", $1, $2, ($2/total)*100)}' file

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2021-02-08 10:32
              
            
            
                                                                       
Here you go, one pass step awk solution -

awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file

[jaypal:~/Temp] cat file
1   10      
2   10
3   20
4   40
[jaypal:~/Temp] awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
1 10 12.5
2 10 12.5
3 20 25
4 40 50


Update: If tab is a required in output then just set the OFS variable to "\t". 

[jaypal:~/Temp] awk -v OFS="\t" 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
1   10  12.5
2   10  12.5
3   20  25
4   40  50


Breakout of pattern {action} statements:


The first pattern is NR==FNR. FNR is awk's in-built variable that keeps track of number of records (by default separated by a new line) in a given file. So FNR in our case would be 4. NR is similar to FNR but it does not get reset to 0. It continues to grow on. So NR in our case would be 8. 
This pattern will be true only for the first 4 records and thats exactly what we want. After perusing through the 4 records, we are assign the total to a variable a. Notice that we did not initialize it. In awk we don't have to. However, this would break if entire column 2 is 0. So you can handle it by putting an if statement in the second action statement i.e do the division only if a > 0 else say division by 0 or something. 
next is needed cause we don't really want second pattern {action} statement to execute. next tells awk to stop further actions and move to the next record. 
Once the four records are parsed, the next pattern{action} begins, which is pretty straight forward. Doing the percentage and print column 1 and 2 along with percentage next to them. 


Note: As @lhf mentioned in the comment, this one-liner will only work as long as you have the data set in a file. It won't work if you pass data through a pipe.

In the comments, there is a discussion going on ways to make this awk one-liner take input from a pipe instead of a file. Well the only way I could think of was to store the column values in array and then using for loop to spit each value out along with their percentage. 

Now arrays in awk are associative and are never in order, i.e pulling the values out of arrays will not be in the same order as they went in. So if that is ok then the following one-liner should work.  

[jaypal:~/Temp] cat file
1   10      
2   10
3   20
4   40

[jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}'
2 10 12.5
3 20 25
4 40 50
1 10 12.5


To get them in order, you can pipe the result to sort. 

[jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}' | sort -n
1 10 12.5
2 10 12.5
3 20 25
4 40 50

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2021-02-08 10:36
              
            
            
                                                                       
You need to escape it as %%. For instance:

printf("%s\t%s\t%s%%\n", $1, $2, $3)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  借酒劲吻你        
                
              
                            
                2021-02-08 10:39
              
            
            
                                                                       
Perhaps there is better way but I would pass file twice.

Content of 'infile':

1       10 
2       10
3       20
4       40


Content of 'script.awk':

BEGIN {
        ## Tab as field separator.
        FS = "\t";
}

## First pass of input file. Get total from second field.
ARGIND == 1 {
        total += $2;
        next;
}

## Second pass of input file. Print each original line and percentage as third field.
{
        printf( "%s\t%2.2f\n", $0, $2 * 100 / total );
}


Run the script in my linux box:

gawk -f script.awk infile infile


And result:

1       10      12.50
2       10      12.50
3       20      25.00
4       40      50.00

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复