gawk

convert a fixed width file from text to csv

南笙酒味 提交于 2020-02-17 05:44:28
问题 I have a large data file in text format and I want to convert it to csv by specifying each column length. number of columns = 5 column length [4 2 5 1 1] sample observations: aasdfh9013512 ajshdj 2445df Expected Output aasd,fh,90135,1,2 ajsh,dj, 2445,d,f 回答1: GNU awk (gawk) supports this directly with FIELDWIDTHS , e.g.: gawk '$1=$1' FIELDWIDTHS='4 2 5 1 1' OFS=, infile Output: aasd,fh,90135,1,2 ajsh,dj, 2445,d,f 回答2: I would use sed and catch the groups with the given length: $ sed -r 's/^(.

awk + bash: combining arbitrary number of files

扶醉桌前 提交于 2020-01-16 13:17:23
问题 I have a script that takes a number of data files with identical layout but different data and combines a specified data column into a new file, like this: gawk '{ names[$1]= 1; data[$1,ARGIND]= $2 } END { for (i in names) print i"\t"data[i,1]"\t"data[i,2]"\t"data[i,3] }' $1 $2 $3 > combined_data.txt ... where the row IDs can be found in the first column, and the interesting data in the second column. This works nicely, but not for an arbitrary number of files. While I could simply add $4 $5

Is it possible to have different behavior for first and second input files to awk?

风格不统一 提交于 2020-01-15 07:40:10
问题 For example, suppose I run the following command: gawk -f AppendMapping.awk Reference.tsv TrueInput.tsv Assume the names of files WILL change. While iterating through the first file, I want to create a mapping. map[$16]=$18 While iterating through the second file, I want to use the mapping. print $1, map[$2] What's the best way to achieve this behavior (ie, different behavior for each input file)? 回答1: As you probably know NR stores the current line number; as you may or may not know, it's

How to filter logs easily with awk?

痴心易碎 提交于 2020-01-09 08:04:46
问题 Suppose I have a log file mylog like this: [01/Oct/2015:16:12:56 +0200] error number 1 [01/Oct/2015:17:12:56 +0200] error number 2 [01/Oct/2015:18:07:56 +0200] error number 3 [01/Oct/2015:18:12:56 +0200] error number 4 [02/Oct/2015:16:12:56 +0200] error number 5 [10/Oct/2015:16:12:58 +0200] error number 6 [10/Oct/2015:16:13:00 +0200] error number 7 [01/Nov/2015:00:10:00 +0200] error number 8 [01/Nov/2015:01:02:00 +0200] error number 9 [01/Jan/2016:01:02:00 +0200] error number 10 And I want to

How to filter logs easily with awk?

我的未来我决定 提交于 2020-01-09 08:04:10
问题 Suppose I have a log file mylog like this: [01/Oct/2015:16:12:56 +0200] error number 1 [01/Oct/2015:17:12:56 +0200] error number 2 [01/Oct/2015:18:07:56 +0200] error number 3 [01/Oct/2015:18:12:56 +0200] error number 4 [02/Oct/2015:16:12:56 +0200] error number 5 [10/Oct/2015:16:12:58 +0200] error number 6 [10/Oct/2015:16:13:00 +0200] error number 7 [01/Nov/2015:00:10:00 +0200] error number 8 [01/Nov/2015:01:02:00 +0200] error number 9 [01/Jan/2016:01:02:00 +0200] error number 10 And I want to

how to create an empty array

断了今生、忘了曾经 提交于 2020-01-07 07:09:33
问题 UPDATE The original description below has many errors; gawk lint does not complain about uninitialized arrays used as RHS of in . For example, the following example gives no errors or warnings. I am not deleting the question because the answer I am about to accept gives good suggestion of using split with an empty string to create an empty array. BEGIN{ LINT = "fatal"; // print x; // LINT gives error if this is uncommented thread = 0; if (thread in threads_start) { print "if"; } else { print

how to create an empty array

旧时模样 提交于 2020-01-07 07:09:10
问题 UPDATE The original description below has many errors; gawk lint does not complain about uninitialized arrays used as RHS of in . For example, the following example gives no errors or warnings. I am not deleting the question because the answer I am about to accept gives good suggestion of using split with an empty string to create an empty array. BEGIN{ LINT = "fatal"; // print x; // LINT gives error if this is uncommented thread = 0; if (thread in threads_start) { print "if"; } else { print

Awk: Sum up column values across multiple files with identical column layout

孤街浪徒 提交于 2020-01-07 05:30:51
问题 I have a number of files with the same header: COL1, COL2, COL3, COL4 You can ignore COL1-COL3. COL4 contains a number. Each file contains about 200 rows. I am trying to sum up across the rows. For example: File 1 COL1 COL2 COL3 COL4 x y z 3 a b c 4 File 2 COL1 COL2 COL3 COL4 x y z 5 a b c 10 Then a new file is returned: COL1 COL2 COL3 COL4 x y z 8 a b c 14 Is there a simple way to do this without AWK? I will use AWK if need be, I just thought there might be a simple one-liner that I could

Edit .csv file with AWK

冷暖自知 提交于 2020-01-06 20:23:52
问题 I have a csv file in which I have to make some changes which you will see in the examples I will put. And I think I can do it with arrays, but I do not know how to structure it. Any ideas? Original File; "1033reto";"V09B";"";"";"";"";"";"QVN";"V09B" "1033reto";"V010";"";"";"";"";"";"QVN";"V010" "1033reto";"V015";"";"";"";"";"";"QVN";"V015" "1033reto";"V08C";"";"";"";"";"";"QVN";"V08C" "1040reto";"V03D";"";"";"";"";"";"QVN";"V03D" "1040reto";"V01C";"";"";"";"";"";"QVN";"V01C" "1050reto";"V03D"

Bash: Grab part of string from a command line output

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-06 19:38:41
问题 I am running a command in CentOS that gives me an output of a string and I want to grab a certain part of that output and set it to a variable. I run the command ebi-describe-env. My output as follows: ApplicationName | CNAME | DATECreated | DateUpdated | Description | EndpointURL | EnvironmentID | EnvironmentName | Health | Stack | Status | TemplateName | Version Label -------------------------- Web App | domain.com | 2012-02-23 | 2012-08-31 | | anotherdomain.com | e-8sgkf3eqbj | Web-App