Awk consider double quoted string as one token and ignore space in between

前端 未结 7 967
没有蜡笔的小新
没有蜡笔的小新 2020-12-15 17:26

Data file - data.txt:

ABC \"I am ABC\" 35 DESC
DEF \"I am not ABC\" 42 DESC

cat data.txt | awk \'{print $2}\'

will re

相关标签:
7条回答
  • 2020-12-15 18:03

    The top answer for this question only works for lines with a single quoted field. When I found this question I needed something that could work for an arbitrary number of quoted fields.

    Eventually I came upon an answer by Wintermute in another thread, and he provided a good generalized solution to this problem. I've just modified it to remove the quotes. Note that you need to invoke awk with -F\" when running the below program.

    BEGIN { OFS = "" } {
        for (i = 1; i <= NF; i += 2) {
            gsub(/[ \t]+/, ",", $i)
        }
        print
    }
    

    This works by observing that every other element in the array will be inside of the quotes when you separate by the "-character, and so it replaces the whitespace dividing the ones not in quotes with a comma.

    You can then easily chain another instance of awk to do whatever processing you need (just use the field separator switch again, -F,).

    Note that this might break if the first field is quoted - I haven't tested it. If it does, though, it should be easy to fix by adding an if statement to start at 2 rather than 1 if the first character of the line is a ".

    0 讨论(0)
  • 2020-12-15 18:06

    Okay, if you really want all three fields, you can get them, but it takes a lot of piping:

    $ cat data.txt | awk -F\" '{print $1 "," $2 "," $3}' | awk -F' ,' '{print $1 "," $2}' | awk -F', ' '{print $1 "," $2}' | awk -F, '{print $1 "," $2 "," $3}'
    ABC,I am ABC,35
    DEF,I am not ABC,42
    

    By the last pipe you've got all three fields to do whatever you'd like with.

    0 讨论(0)
  • 2020-12-15 18:13

    Here is something like what I finally got working that is more generic for my project. Note it doesn't use awk.

    someText="ABC \"I am ABC\" 35 DESC '1 23' testing 456"
    putItemsInLines() {
        local items=""
        local firstItem="true"
        while test $# -gt 0; do
            if [ "$firstItem" == "true" ]; then
                items="$1"
                firstItem="false"
            else
                items="$items
    $1"
            fi
            shift
        done
        echo "$items"
    }
    
    count=0
    while read -r valueLine; do
        echo "$count: $valueLine"
        count=$(( $count + 1 ))
    done <<< "$(eval putItemsInLines $someText)"
    

    Which outputs:

    0: ABC
    1: I am ABC
    2: 35
    3: DESC
    4: 1 23
    5: testing
    6: 456
    
    0 讨论(0)
  • 2020-12-15 18:14

    Another alternative would be to use the FPAT variable, that defines a regular expression describing the contents of each field.

    Save this AWK script as parse.awk:

    #!/bin/awk -f
    
    BEGIN {
      FPAT = "([^ ]+)|(\"[^\"]+\")"
    }
    {
      print $2
    }
    

    Make it executable with chmod +x ./parse.awk and parse your data file as ./parse.awk data.txt:

    "I am ABC"
    "I am not ABC"
    
    0 讨论(0)
  • 2020-12-15 18:16

    I've scrunched up together a function that re-splits $0 into an array called B. Spaces between double quotes are not acting as field separators. Works with any number of fields, a mix of quoted and unquoted ones. Here goes:

    #!/usr/bin/gawk -f
    
    # Resplit $0 into array B. Spaces between double quotes are not separators.
    # Single quotes not handled. No escaping of double quotes.
    function resplit(       a, l, i, j, b, k, BNF) # all are local variables
    {
      l=split($0, a, "\"")
      BNF=0
      delete B
      for (i=1;i<=l;++i)
      {
        if (i % 2)
        {
          k=split(a[i], b)
          for (j=1;j<=k;++j)
            B[++BNF] = b[j]
        }
        else
        {
          B[++BNF] = "\""a[i]"\""
        }
      }
    }
    
    {
      resplit()
    
      for (i=1;i<=length(B);++i)
        print i ": " B[i]
    }
    

    Hope it helps.

    0 讨论(0)
  • 2020-12-15 18:17

    Try this:

    $ cat data.txt | awk -F\" '{print $2}'
    I am ABC
    I am not ABC
    
    0 讨论(0)
提交回复
热议问题