How to Add Column with Percentage

前端 未结 4 1648
半阙折子戏
半阙折子戏 2021-02-08 09:53

I would like to calculate percentage of value in each line out of all lines and add it as another column. Input (delimiter is \\t):

1   10      
2   10
3   20
4          


        
相关标签:
4条回答
  • 2021-02-08 10:27

    You can do it in a couple of passes

    #!/bin/bash
    
    total=$(awk '{total=total+$2}END{print total}' file)
    awk -v total=$total '{ printf ("%s\t%s\t%.2f\n", $1, $2, ($2/total)*100)}' file
    
    0 讨论(0)
  • 2021-02-08 10:32

    Here you go, one pass step awk solution -

    awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file

    [jaypal:~/Temp] cat file
    1   10      
    2   10
    3   20
    4   40
    [jaypal:~/Temp] awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
    1 10 12.5
    2 10 12.5
    3 20 25
    4 40 50
    

    Update: If tab is a required in output then just set the OFS variable to "\t".

    [jaypal:~/Temp] awk -v OFS="\t" 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
    1   10  12.5
    2   10  12.5
    3   20  25
    4   40  50
    

    Breakout of pattern {action} statements:

    • The first pattern is NR==FNR. FNR is awk's in-built variable that keeps track of number of records (by default separated by a new line) in a given file. So FNR in our case would be 4. NR is similar to FNR but it does not get reset to 0. It continues to grow on. So NR in our case would be 8.

    • This pattern will be true only for the first 4 records and thats exactly what we want. After perusing through the 4 records, we are assign the total to a variable a. Notice that we did not initialize it. In awk we don't have to. However, this would break if entire column 2 is 0. So you can handle it by putting an if statement in the second action statement i.e do the division only if a > 0 else say division by 0 or something.

    • next is needed cause we don't really want second pattern {action} statement to execute. next tells awk to stop further actions and move to the next record.

    • Once the four records are parsed, the next pattern{action} begins, which is pretty straight forward. Doing the percentage and print column 1 and 2 along with percentage next to them.

    Note: As @lhf mentioned in the comment, this one-liner will only work as long as you have the data set in a file. It won't work if you pass data through a pipe.

    In the comments, there is a discussion going on ways to make this awk one-liner take input from a pipe instead of a file. Well the only way I could think of was to store the column values in array and then using for loop to spit each value out along with their percentage.

    Now arrays in awk are associative and are never in order, i.e pulling the values out of arrays will not be in the same order as they went in. So if that is ok then the following one-liner should work.

    [jaypal:~/Temp] cat file
    1   10      
    2   10
    3   20
    4   40
    
    [jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}'
    2 10 12.5
    3 20 25
    4 40 50
    1 10 12.5
    

    To get them in order, you can pipe the result to sort.

    [jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}' | sort -n
    1 10 12.5
    2 10 12.5
    3 20 25
    4 40 50
    
    0 讨论(0)
  • 2021-02-08 10:36

    You need to escape it as %%. For instance:

    printf("%s\t%s\t%s%%\n", $1, $2, $3)
    
    0 讨论(0)
  • 2021-02-08 10:39

    Perhaps there is better way but I would pass file twice.

    Content of 'infile':

    1       10 
    2       10
    3       20
    4       40
    

    Content of 'script.awk':

    BEGIN {
            ## Tab as field separator.
            FS = "\t";
    }
    
    ## First pass of input file. Get total from second field.
    ARGIND == 1 {
            total += $2;
            next;
    }
    
    ## Second pass of input file. Print each original line and percentage as third field.
    {
            printf( "%s\t%2.2f\n", $0, $2 * 100 / total );
    }
    

    Run the script in my linux box:

    gawk -f script.awk infile infile
    

    And result:

    1       10      12.50
    2       10      12.50
    3       20      25.00
    4       40      50.00
    
    0 讨论(0)
提交回复
热议问题