gawk

How to handle 3 files with awk?

夙愿已清 提交于 2019-12-03 08:46:18
Ok, so after spending 2 days, I am not able solve it and I am almost out of time now. It might be a very silly question, so please bear with me. My awk script does something like this: BEGIN{ n=50; i=n; } FNR==NR { # Read file-1, which has just 1 column ids[$1]=int(i++/n); next } { # Read file-2 which has 4 columns # Do something next } END {...} It works fine. But now I want to extend it to read 3 files. Let's say, instead of hard-coding the value of "n", I need to read a properties file and set value of "n" from that. I found this question and have tried something like this: BEGIN{ n=0; i=0;

how to use sed, awk, or gawk to print only what is matched?

大憨熊 提交于 2019-12-02 14:46:36
I see lots of examples and man pages on how to do things like search-and-replace using sed, awk, or gawk. But in my case, I have a regular expression that I want to run against a text file to extract a specific value. I don't want to do search-and-replace. This is being called from bash. Let's use an example: Example regular expression: .*abc([0-9]+)xyz.* Example input file: a b c abc12345xyz a b c As simple as this sounds, I cannot figure out how to call sed/awk/gawk correctly. What I was hoping to do, is from within my bash script have: myvalue=$( sed <...something...> input.txt ) Things I

Calculate date difference between $2,$3 from file in awk

倖福魔咒の 提交于 2019-12-02 11:16:12
问题 I would need your help. File with only date, file.txt P1,2013/jul/9,2013/jul/14 P2,2013/jul/14,2013/jul/6 P3,2013/jul/7,2013/jul/5 display output like this P1,2013/jul/9,2013/jul/14,5days P2,2013/jul/14,2013/jul/6,8days P3,2013/jul/7,2013/jul/5,2days 回答1: awk ' BEGIN { months = "jan feb mar apr may jun jul aug sep oct nov dec" OFS = FS = "," } function date2time(date, a,mon) { split(date, a, "/") mon = 1 + (index(months, a[2])-1)/4 return mktime(a[1] " " mon " " a[3] " 0 0 0") } function abs

AWK - Is it possible to Breakdown a log file by a distinct field && by hour

≯℡__Kan透↙ 提交于 2019-12-02 07:41:37
Question I am trying to find out if it is possible with awk alone to pass in a log file and then have awk output a distinct message with a breakdown of the hour (00-23) as well as a count, for that particular hour vs distinct message. Example Output requested Message1 00 13 01 30 ... 23 6 Message2 00 50 01 10 ... 23 120 etc, etc The input file would look a little something like the following: blah,blah 2016-06-24 00:30:54 blah Message1 7 rand rand2 2016-06-24 00:40:12 blah Message2 35 rand rand2 2016-06-24 00:42:15 blah Message2 12 rand rand2 2016-06-24 00:58:01 blah Message1 5 rand rand2 2016

awk group by multiple columns and print max value with non-primary key

北战南征 提交于 2019-12-02 02:25:49
问题 i'm new to this site and trying to learn awk. i'm trying to find the maximum value of field3, grouping by field1 and print all the fields with maximum value. Field 2 contains time, that means for each item1 there is 96 values of field2,field3 and field4 input file: (comma separated) item1,00:15,10,30 item2,00:45,20,45 item2,12:15,30,45 item1,00:30,20,56 item3,23:00,40,44 item1,12:45,50,55 item3,11:15,30,45 desired output: item1,12:45,50,55 item2,12:15,30,45 item3,11:15,30,45 what i tried so

awk group by multiple columns and print max value with non-primary key

半腔热情 提交于 2019-12-02 00:06:51
i'm new to this site and trying to learn awk. i'm trying to find the maximum value of field3, grouping by field1 and print all the fields with maximum value. Field 2 contains time, that means for each item1 there is 96 values of field2,field3 and field4 input file: (comma separated) item1,00:15,10,30 item2,00:45,20,45 item2,12:15,30,45 item1,00:30,20,56 item3,23:00,40,44 item1,12:45,50,55 item3,11:15,30,45 desired output: item1,12:45,50,55 item2,12:15,30,45 item3,11:15,30,45 what i tried so far: BEGIN{ FS=OFS=","} { if (a[$1]<$3){ a[$1]=$3} } END{ for (i in a ){ print i,a[i] } but this only

Quantifiers in a regular expression used with awk behave unexpected

﹥>﹥吖頭↗ 提交于 2019-12-01 23:34:34
I want to process this list: (Of course this is just an excerpt.) 1 S3 -> PC-8-Set 2 S3 -> PC-850-Set 3 S3 -> ANSI-Set 4 S3 -> 7-Bit-NRC 5 PC-8-Set -> S3 6 PC-850-Set -> S3 7 ANSI-Set -> S3 This is what I did: awk -F '[[:blank:]]+' '{printf ("%s ", $2)}' list This is what I got: 1 2 3 4 5 6 7 Now I thought the quantifier + is equivalent to {1,} , but when I changed the line to awk -F '[[:blank:]]{1,}' '{printf ("%s ", $2)}' list I got just blanks and the whole line was read to $1. Can someone explain this behaviour please? I'm thankful for every answer! Jotne Try awk --re-interval -F '[[:blank

Bash: Parse CSV with quotes, commas and newlines

笑着哭i 提交于 2019-12-01 16:39:56
问题 Say I have the following csv file: id,message,time 123,"Sorry, This message has commas and newlines",2016-03-28T20:26:39 456,"It makes the problem non-trivial",2016-03-28T20:26:41 I want to write a bash command that will return only the time column. i.e. time 2016-03-28T20:26:39 2016-03-28T20:26:41 What is the most straight forward way to do this? You can assume the availability of standard unix utils such as awk, gawk, cut, grep, etc. Note the presence of "" which escape , and newline

AWK: go through the file twice, doing different tasks

怎甘沉沦 提交于 2019-12-01 16:02:00
I am processing a fairly big collection of Tweets and I'd like to obtain, for each tweet, its mentions (other user's names, prefixed with an @ ), if the mentioned user is also in the file: users = new Dictionary() for each line in file: username = get_username(line) userid = get_userid(line) users.add(key = userid, value = username) for each line in file: mentioned_names = get_mentioned_names(line) mentioned_ids = mentioned_names.map(x => if x in users: users[x] else null) print "$line | $mentioned_ids" I was already processing the file with GAWK, so instead of processing it again in Python or

Is there a way to completely delete fields in awk, so that extra delimiters do not print?

非 Y 不嫁゛ 提交于 2019-12-01 15:14:26
Consider the following command: gawk -F"\t" "BEGIN{OFS=\"\t\"}{$2=$3=\"\"; print $0}" Input.tsv When I set $2 = $3 = "", the intended effect to get the same effect as writing: print $1,$4,$5...$NF However, what actually happens is that I get two empty fields, with the extra field delimiters still printing. Is it possible to actually delete $2 and $3? Note: If this was on Linux in bash , the correct statement above would be the following, but Windows does not handle single quotes well in cmd.exe . gawk -F'\t' 'BEGIN{OFS="\t"}{$2=$3=""; print $0}' Input.tsv This is an oldie but goodie. As