Awk consider double quoted string as one token and ignore space in between

前端未结

关注

 7  967

Data file - data.txt:

ABC \"I am ABC\" 35 DESC
DEF \"I am not ABC\" 42 DESC

cat data.txt | awk \'{print $2}\'

will re

相关标签:

7条回答

挽巷

2020-12-15 18:03
The top answer for this question only works for lines with a single quoted field. When I found this question I needed something that could work for an arbitrary number of quoted fields.

Eventually I came upon an answer by Wintermute in another thread, and he provided a good generalized solution to this problem. I've just modified it to remove the quotes. Note that you need to invoke awk with -F\" when running the below program.
```
BEGIN { OFS = "" } {
    for (i = 1; i <= NF; i += 2) {
        gsub(/[ \t]+/, ",", $i)
    }
    print
}
```
This works by observing that every other element in the array will be inside of the quotes when you separate by the "-character, and so it replaces the whitespace dividing the ones not in quotes with a comma.

You can then easily chain another instance of awk to do whatever processing you need (just use the field separator switch again, -F,).

Note that this might break if the first field is quoted - I haven't tested it. If it does, though, it should be easy to fix by adding an if statement to start at 2 rather than 1 if the first character of the line is a ".
0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-12-15 18:06
Okay, if you really want all three fields, you can get them, but it takes a lot of piping:
```
$ cat data.txt | awk -F\" '{print $1 "," $2 "," $3}' | awk -F' ,' '{print $1 "," $2}' | awk -F', ' '{print $1 "," $2}' | awk -F, '{print $1 "," $2 "," $3}'
ABC,I am ABC,35
DEF,I am not ABC,42
```
By the last pipe you've got all three fields to do whatever you'd like with.
0 讨论(0)
发布评论:

提交评论
- 加载中...

时光说笑

2020-12-15 18:13

Here is something like what I finally got working that is more generic for my project. Note it doesn't use awk.

someText="ABC \"I am ABC\" 35 DESC '1 23' testing 456"
putItemsInLines() {
    local items=""
    local firstItem="true"
    while test $# -gt 0; do
        if [ "$firstItem" == "true" ]; then
            items="$1"
            firstItem="false"
        else
            items="$items
$1"
        fi
        shift
    done
    echo "$items"
}

count=0
while read -r valueLine; do
    echo "$count: $valueLine"
    count=$(( $count + 1 ))
done <<< "$(eval putItemsInLines $someText)"

Which outputs:

0: ABC
1: I am ABC
2: 35
3: DESC
4: 1 23
5: testing
6: 456

0 讨论(0)

失恋的感觉

2020-12-15 18:14
Another alternative would be to use the FPAT variable, that defines a regular expression describing the contents of each field.

Save this AWK script as parse.awk:
```
#!/bin/awk -f

BEGIN {
  FPAT = "([^ ]+)|(\"[^\"]+\")"
}
{
  print $2
}
```
Make it executable with chmod +x ./parse.awk and parse your data file as ./parse.awk data.txt:
```
"I am ABC"
"I am not ABC"
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

不知归路

2020-12-15 18:16

I've scrunched up together a function that re-splits $0 into an array called B. Spaces between double quotes are not acting as field separators. Works with any number of fields, a mix of quoted and unquoted ones. Here goes:

#!/usr/bin/gawk -f

# Resplit $0 into array B. Spaces between double quotes are not separators.
# Single quotes not handled. No escaping of double quotes.
function resplit(       a, l, i, j, b, k, BNF) # all are local variables
{
  l=split($0, a, "\"")
  BNF=0
  delete B
  for (i=1;i<=l;++i)
  {
    if (i % 2)
    {
      k=split(a[i], b)
      for (j=1;j<=k;++j)
        B[++BNF] = b[j]
    }
    else
    {
      B[++BNF] = "\""a[i]"\""
    }
  }
}

{
  resplit()

  for (i=1;i<=length(B);++i)
    print i ": " B[i]
}

Hope it helps.

0 讨论(0)

臣服心动

2020-12-15 18:17
Try this:
```
$ cat data.txt | awk -F\" '{print $2}'
I am ABC
I am not ABC
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页