gawk

matching a specific substring with regular expressions using awk

白昼怎懂夜的黑 提交于 2019-12-06 03:49:52
I'm dealing with a specific filenames, and need to extract information from them. The structure of the filename is similar to: "20100613_M4_28007834.005_F_RANDOMSTR.raw.gz" with RANDOMSTR a string of max 22 chars, and which may contain a substring (or not) with the format "-W[0-9].[0-9]{2}.[0-9]{3}". This substring also has the unique feature of starting with "-W". The information I need to extract is the substring of RANDOMSTR without this optional substring. I want to implement this in a bash script, and so far the best option I found is to use gawk with a regular expression. My best attempt

Change a string in a file with sed?

别说谁变了你拦得住时间么 提交于 2019-12-06 02:48:54
I have a inputfile with template as shown below. I want to change the Version: using sed. Package: somename Priority: extra Section: checkinstall Maintainer: joe@example.com Architecture: i386 Version: 3.1.0.2-1 Depends: Provides: somename Description: some description Currently I am getting the current version using grep -m 1 Version inputfile | sed 's/[:_#a-zA-Z\s"]*//g' and I am trying to replace the current version with sed 's/3.1.0.2-1/3.1.0.2/' inputfile However this does not seem to work, but when I try it in command line using echo it works. echo 'Version: 3.0.9.1' | sed 's/3.0.9.1/3.2

any way to access the matched groups in action? [duplicate]

旧时模样 提交于 2019-12-05 05:17:50
This question already has an answer here: AWK: Access captured group from line pattern 6 answers I often find myself doing the same match in the action as the pattern, to access some part of the input record, e.g. /^Compiled from \"(.*)\"$/ { file_name = gensub("^Compiled from \"(.*)\"$", "\\1", "g"); print file_name; } So the regexp matching is done twice. Is there any way I can access \\1 in the action without matching again? I am trying to both reduce on pattert matching and extra code. Unfortunately, GAWK, doesn't have the carry-forward feature of sed which uses an empty // . sed '/\(patt\

How to handle 3 files with awk?

限于喜欢 提交于 2019-12-04 13:50:11
问题 Ok, so after spending 2 days, I am not able solve it and I am almost out of time now. It might be a very silly question, so please bear with me. My awk script does something like this: BEGIN{ n=50; i=n; } FNR==NR { # Read file-1, which has just 1 column ids[$1]=int(i++/n); next } { # Read file-2 which has 4 columns # Do something next } END {...} It works fine. But now I want to extend it to read 3 files. Let's say, instead of hard-coding the value of "n", I need to read a properties file and

AWK: redirecting script output from script to another file with dynamic name

徘徊边缘 提交于 2019-12-04 13:40:08
I know I can redirect awk's print output to another file from within a script, like this: awk '{print $0 >> "anotherfile" }' 2procfile (I know that's dummy example, but it's just an example...) But what I need is to redirect output to another file, which has a dynamic name like this awk -v MYVAR"somedinamicdata" '{print $0 >> "MYWAR-SomeStaticText" }' 2procfile And the outpus should be redirected to somedinamicdata-SomeStaticText . I know I can do it via: awk '{print $0 }' 2procfile >> "$MYVAR-somedinamicdata" But the problem is that it's a bigger awk script, and I have to output to several

gawk command in CMD with && operator not working

不羁的心 提交于 2019-12-04 05:04:33
问题 I'm issuing gawk command from Windows CMD but it just stuck there.Same command is working perfectly fine in Cygwin Terminal. I am trying to find first occurrence of ending brace "}" on first column in a file after line number 30 Command is gawk 'NR > 30 && /^}$/ { print NR; exit }' Filename.c > Output.txt I noticed another thing that when i issue command form CMD,Besides sticking it creates a file with the name of Line number(If above command is executed then 30 is created) 回答1: The command

One nearest neighbour using awk

谁说胖子不能爱 提交于 2019-12-04 04:42:07
问题 This is what I am trying to do using AWK language. I have a problem with mainly step 2. I have shown a sample dataset but the original dataset consists of 100 fields and 2000 records. Algorithm 1) initialize accuracy = 0 2) for each record r Find the closest other record, o, in the dataset using distance formula To find the nearest neighbour for r0, I need to compare r0 with r1 to r9 and do math as follows: square(abs(r0.c1 - r1.c1)) + square(abs(r0.c2 - r1.c2)) + ...+square(abs(r0.c5 - r1.c5

Is there a way to completely delete fields in awk, so that extra delimiters do not print?

我的未来我决定 提交于 2019-12-04 03:03:25
问题 Consider the following command: gawk -F"\t" "BEGIN{OFS=\"\t\"}{$2=$3=\"\"; print $0}" Input.tsv When I set $2 = $3 = "", the intended effect to get the same effect as writing: print $1,$4,$5...$NF However, what actually happens is that I get two empty fields, with the extra field delimiters still printing. Is it possible to actually delete $2 and $3? Note: If this was on Linux in bash , the correct statement above would be the following, but Windows does not handle single quotes well in cmd

AWK: go through the file twice, doing different tasks

若如初见. 提交于 2019-12-04 02:59:44
问题 I am processing a fairly big collection of Tweets and I'd like to obtain, for each tweet, its mentions (other user's names, prefixed with an @ ), if the mentioned user is also in the file: users = new Dictionary() for each line in file: username = get_username(line) userid = get_userid(line) users.add(key = userid, value = username) for each line in file: mentioned_names = get_mentioned_names(line) mentioned_ids = mentioned_names.map(x => if x in users: users[x] else null) print "$line |

Using awk to interpolate data column based in a data file with date and time

余生长醉 提交于 2019-12-04 02:58:16
问题 The following file has multiple columns with date, time and incomplete data set as shown using a simple file # Matrix.txt 13.09.2016:23:44:10;;4.0 13.09.2016:23:44:20;10.0; 13.09.2016:23:44:30;; 13.09.2016:23:44:40;30.0;7.0 How can I do an linear interpolation on each column using awk to get the missing data: # Output.txt 13.09.2016:23:44:10;0.0;4.0 13.09.2016:23:44:20;10.0;5.0 13.09.2016:23:44:30;20.0;6.0 13.09.2016:23:44:40;30.0;7.0 回答1: Here is one solution in Gnu awk. It runs twice for