gawk | 易学教程

awk: keep records with the highest value that share a field, while ignoring other fields

阅读更多关于 awk: keep records with the highest value that share a field, while ignoring other fields

问题 Imagine that you want to keep the records with the highest value in a given field of a table, just comparing within the categories defined by another field (and ignoring the contents of the others). So, given the input nye.txt: X A 10.00 X A 1.50 X B 0.01 X B 4.00 Y C 1.00 Y C 2.43 You'd expect this output: X A 10.00 Y C 2.43 This is an offshot of this previous, related thread: awk: keep records with the highest value, comparing those that share other fields I already have a solution (see

Selecting columns using specific patterns then finding sum and ratio

阅读更多关于 Selecting columns using specific patterns then finding sum and ratio

问题 I want to calculate the sum and ratio values from data below. (The actual data contains more than 200,000 columns and 45000 rows (lines)). For clarity purpose I have given only simple data format. #Frame BMR_42@O22 BMR_49@O13 BMR_59@O13 BMR_23@O26 BMR_10@O13 BMR_61@O26 BMR_23@O25 1 1 1 0 1 1 1 1 2 0 1 0 0 1 1 0 3 1 1 1 0 0 1 1 4 1 1 0 0 1 0 1 5 0 0 0 0 0 0 0 6 1 0 1 1 0 1 0 7 1 1 1 1 0 0 0 8 1 1 1 0 0 0 0 9 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 The columns need to be selected with certain criteria.

Delete the row if it contains more than specific number of non numeric values

阅读更多关于 Delete the row if it contains more than specific number of non numeric values

问题 I have a large (2GB) comma separated textfile containing some data from Sensors. Sometimes the sensors are off and there is no data. I want to delete the rows if there are more than specified number of No Data or Off or any non-numeric values in each row; excluding the header. I am only interested in counting from 3rd column onwards. For example: my data looks like: Tag, Description,2015/01/01,2015/01/01 00:01:00,2015/01/01 00:02:00, 2015/01/01 00:02:00 1827XYZR/KB.SAT,Data from Process Value

awk: keep records with the highest value, comparing those that share other fields

阅读更多关于 awk: keep records with the highest value, comparing those that share other fields

问题 I'm trying to write an awk script that keeps the records with a highest value in a given field, but only comparing records that share two other fields. I'd better give an example -- this is the input.txt: X A 10.00 X A 1.50 X B 0.01 X B 4.00 Y C 1.00 Y C 2.43 I want to compare all the records sharing the same value in the 1st and 2nd fields (X A, X B or Y C) and pick the one with a highest numerical value in the 3rd field. So, I expect this output: X A 10.00 X B 4.00 Y C 2.43 With this

Need to calculate standard deviation from an array using bash and awk?

阅读更多关于 Need to calculate standard deviation from an array using bash and awk?

问题 Guys I'm new to awk and I'm struggling with awk command to find the standard deviation. I have got the mean using the following: echo ${GfieldList[@]} | awk 'NF {sum=0;for (i=1;i<=NF;i++)sum+=$i; print "Mean= " sum / NF; }' Standard Deviation formula is: sqrt((1/N)*(sum of (value - mean)^2)) I have found the mean using the above formula Can you guys help me with the awk command for this one? 回答1: Once you know the mean: awk '{ for (i = 1;i <= NF; i++) { sum += $i }; print sum / NF }' # for 2,

How can I check if a GNU awk coprocess is open, or force it to open without writing to it?

阅读更多关于 How can I check if a GNU awk coprocess is open, or force it to open without writing to it?

问题 I have a gawk program that uses a coprocess. However, sometimes I don't have any data to write to the coprocess, and my original script hangs while waiting for the output of the coprocess. The code below reads from STDIN, writes each line to a "cat" program, running as a coprocess. Then it reads the coprocess output back in and writes it to STDOUT. If we change the if condition to be 1==0, nothing gets written to the coprocess, and the program hangs at the while loop. From the manual, it

How to check if a variable is an array?

阅读更多关于 How to check if a variable is an array?

问题 I was playing with PROCINFO and its sorted_in index to be able to control the array transversal. Then I wondered what are the contents of PROCINFO , so I decided to go through it and print its values: $ awk 'BEGIN {for (i in PROCINFO) print i, PROCINFO[i]}' ppid 7571 pgrpid 14581 api_major 1 api_minor 1 group1 545 gid 545 group2 1000 egid 545 group3 10004 awk: cmd. line:1: fatal: attempt to use array `PROCINFO["identifiers"]' in a scalar context As you see, it breaks because there is -at

Subtructing n number of columns from two files with AWK

阅读更多关于 Subtructing n number of columns from two files with AWK

问题 I have two files with N number of columns File1: A 1 2 3 ....... Na1 B 2 3 4 ....... Nb1 File2: A 2 2 4 ....... Na2 B 1 3 4 ....... Nb2 i want a output where 1st column value from File1 will be subtracted from 1st column of File2, and this way till column N as shown below: A -1 0 -1 ........ (Na1-Na2) B 1 0 0 ........ (Nb1-Nb2) How to do this is AWK, or Perl scripting in Linux environment? 回答1: Something like this: use strict; use warnings; my (@fh, @v); for (@ARGV) { open (my $handle, "<", $

Printing thousand separated floats with GAWK

阅读更多关于 Printing thousand separated floats with GAWK

问题 I must process some huge file with gawk. My main problem is that I have to print some floats using thousand separators. E.g.: 10000 should appear as 10.000 and 10000,01 as 10.000,01 in the output. I (and Google) come up with this function, but this fails for floats: function commas(n) { gsub(/,/,"",n) point = index(n,".") - 1 if (point < 0) point = length(n) while (point > 3) { point -= 3 n = substr(n,1,point)"."substr(n,point + 1) } sub(/-\./,"-",n) return d n } But it fails with floats. Now

awk: fatal: Invalid regular expression when setting multiple field separators

阅读更多关于 awk: fatal: Invalid regular expression when setting multiple field separators

问题 I was trying to solve Grep regex to select only 10 character using awk . The question consists in a string XXXXXX[YYYYY--ZZZZZ and the OP wants to print the text in between the unique [ and -- strings within the text. If it was just one - I would say use [-[] as field separator (FS). This is setting the FS to be either - or [ : $ echo "XXXXXXX[YYYYY-ZZZZ" | awk -F[-[] '{print $2}' YYYYY The tricky point is that [ has also a special meaning as a character class, so that to make it be correctly