gawk

awk: keep records with the highest value that share a field, while ignoring other fields

人盡茶涼 提交于 2019-12-11 10:55:42
问题 Imagine that you want to keep the records with the highest value in a given field of a table, just comparing within the categories defined by another field (and ignoring the contents of the others). So, given the input nye.txt: X A 10.00 X A 1.50 X B 0.01 X B 4.00 Y C 1.00 Y C 2.43 You'd expect this output: X A 10.00 Y C 2.43 This is an offshot of this previous, related thread: awk: keep records with the highest value, comparing those that share other fields I already have a solution (see

Selecting columns using specific patterns then finding sum and ratio

谁说我不能喝 提交于 2019-12-11 10:08:50
问题 I want to calculate the sum and ratio values from data below. (The actual data contains more than 200,000 columns and 45000 rows (lines)). For clarity purpose I have given only simple data format. #Frame BMR_42@O22 BMR_49@O13 BMR_59@O13 BMR_23@O26 BMR_10@O13 BMR_61@O26 BMR_23@O25 1 1 1 0 1 1 1 1 2 0 1 0 0 1 1 0 3 1 1 1 0 0 1 1 4 1 1 0 0 1 0 1 5 0 0 0 0 0 0 0 6 1 0 1 1 0 1 0 7 1 1 1 1 0 0 0 8 1 1 1 0 0 0 0 9 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 The columns need to be selected with certain criteria.

Delete the row if it contains more than specific number of non numeric values

假装没事ソ 提交于 2019-12-11 05:46:37
问题 I have a large (2GB) comma separated textfile containing some data from Sensors. Sometimes the sensors are off and there is no data. I want to delete the rows if there are more than specified number of No Data or Off or any non-numeric values in each row; excluding the header. I am only interested in counting from 3rd column onwards. For example: my data looks like: Tag, Description,2015/01/01,2015/01/01 00:01:00,2015/01/01 00:02:00, 2015/01/01 00:02:00 1827XYZR/KB.SAT,Data from Process Value

awk: keep records with the highest value, comparing those that share other fields

坚强是说给别人听的谎言 提交于 2019-12-11 03:09:57
问题 I'm trying to write an awk script that keeps the records with a highest value in a given field, but only comparing records that share two other fields. I'd better give an example -- this is the input.txt: X A 10.00 X A 1.50 X B 0.01 X B 4.00 Y C 1.00 Y C 2.43 I want to compare all the records sharing the same value in the 1st and 2nd fields (X A, X B or Y C) and pick the one with a highest numerical value in the 3rd field. So, I expect this output: X A 10.00 X B 4.00 Y C 2.43 With this

Need to calculate standard deviation from an array using bash and awk?

别等时光非礼了梦想. 提交于 2019-12-10 22:15:20
问题 Guys I'm new to awk and I'm struggling with awk command to find the standard deviation. I have got the mean using the following: echo ${GfieldList[@]} | awk 'NF {sum=0;for (i=1;i<=NF;i++)sum+=$i; print "Mean= " sum / NF; }' Standard Deviation formula is: sqrt((1/N)*(sum of (value - mean)^2)) I have found the mean using the above formula Can you guys help me with the awk command for this one? 回答1: Once you know the mean: awk '{ for (i = 1;i <= NF; i++) { sum += $i }; print sum / NF }' # for 2,

How can I check if a GNU awk coprocess is open, or force it to open without writing to it?

ε祈祈猫儿з 提交于 2019-12-10 18:59:50
问题 I have a gawk program that uses a coprocess. However, sometimes I don't have any data to write to the coprocess, and my original script hangs while waiting for the output of the coprocess. The code below reads from STDIN, writes each line to a "cat" program, running as a coprocess. Then it reads the coprocess output back in and writes it to STDOUT. If we change the if condition to be 1==0, nothing gets written to the coprocess, and the program hangs at the while loop. From the manual, it

How to check if a variable is an array?

巧了我就是萌 提交于 2019-12-10 17:10:17
问题 I was playing with PROCINFO and its sorted_in index to be able to control the array transversal. Then I wondered what are the contents of PROCINFO , so I decided to go through it and print its values: $ awk 'BEGIN {for (i in PROCINFO) print i, PROCINFO[i]}' ppid 7571 pgrpid 14581 api_major 1 api_minor 1 group1 545 gid 545 group2 1000 egid 545 group3 10004 awk: cmd. line:1: fatal: attempt to use array `PROCINFO["identifiers"]' in a scalar context As you see, it breaks because there is -at

Subtructing n number of columns from two files with AWK

冷暖自知 提交于 2019-12-10 14:18:53
问题 I have two files with N number of columns File1: A 1 2 3 ....... Na1 B 2 3 4 ....... Nb1 File2: A 2 2 4 ....... Na2 B 1 3 4 ....... Nb2 i want a output where 1st column value from File1 will be subtracted from 1st column of File2, and this way till column N as shown below: A -1 0 -1 ........ (Na1-Na2) B 1 0 0 ........ (Nb1-Nb2) How to do this is AWK, or Perl scripting in Linux environment? 回答1: Something like this: use strict; use warnings; my (@fh, @v); for (@ARGV) { open (my $handle, "<", $

Printing thousand separated floats with GAWK

梦想的初衷 提交于 2019-12-10 11:53:06
问题 I must process some huge file with gawk. My main problem is that I have to print some floats using thousand separators. E.g.: 10000 should appear as 10.000 and 10000,01 as 10.000,01 in the output. I (and Google) come up with this function, but this fails for floats: function commas(n) { gsub(/,/,"",n) point = index(n,".") - 1 if (point < 0) point = length(n) while (point > 3) { point -= 3 n = substr(n,1,point)"."substr(n,point + 1) } sub(/-\./,"-",n) return d n } But it fails with floats. Now

awk: fatal: Invalid regular expression when setting multiple field separators

情到浓时终转凉″ 提交于 2019-12-09 01:44:25
问题 I was trying to solve Grep regex to select only 10 character using awk . The question consists in a string XXXXXX[YYYYY--ZZZZZ and the OP wants to print the text in between the unique [ and -- strings within the text. If it was just one - I would say use [-[] as field separator (FS). This is setting the FS to be either - or [ : $ echo "XXXXXXX[YYYYY-ZZZZ" | awk -F[-[] '{print $2}' YYYYY The tricky point is that [ has also a special meaning as a character class, so that to make it be correctly