gawk / awk: piping date to getline *sometimes* won't work

杀马特。学长 韩版系。学妹 提交于 2019-11-30 05:12:21

问题


I'm attempting to convert dates from one format to another: From e.g. "October 29, 2005" to 2005-10-29. I have a list of 625 dates. I use Awk.

The conversion works -- most of the time. Hovewer, sometimes the conversion won't happen at all, and the variable supposed to hold the (converted) date remains undefined.

This always happens with the exact same rows. Running `date' explicitly (from the Bash shell) on the dates of those weird rows works fine (the dates are properly converted). -- It's not the textual contents of those rows that matters.

Why this behavior, and how can I fix my script?
Her it is:

awk 'BEGIN { FS = "unused" } { 
  x = "undefined";
  "date \"+%Y-%m-%d\" -d " $1 | getline x ;
  print $1 " = " x
}' uBXr0r15.txt \
 > bug-out-3.txt

If you want to reproduce this problem:

  1. Download this file: uBXr0r15.txt.
  2. Run the Awk skript.
  3. Search for "undefined" in bug-out-3.txt.
    ("undefined" found 122 times, on my computer.)

Then you could run the script again, and (on my computer) bug-out-3.txt remains unchanged -- exactly the same dates are left undefined.

(Gawk version 3.1.6, Ubuntu 9.10.)

Kind regards, Magnus


回答1:


Whenever you open a pipe or file for reading or writing in awk, the latter will first check (using an internal hash) whether it already has a pipe or file with the same name (still) open; if so, it will reuse the existing file descriptor instead of reopening the pipe or file.

In your case, all entries which end up as undefined are actually duplicates; the first time that they are encountered (i.e. when the corresponding command date "..." -d "..." is first issued) the proper result is read into x. On subsequent occurrences of the same date, getline attempts to read a second, third etc. lines from the original date pipe, even though the pipe has been closed by date, resulting in x no longer being assigned.

From the gawk man-page:

NOTE: If using a pipe, co-process, or socket to getline, or from print or printf within a loop, you must use close() to create new instances of the command or socket. AWK does not automatically close pipes, sockets, or co-processes when they return EOF.

You should explicitly close the pipe every time after you have read x:

close("date \"+%Y-%m-%d\" -d " $1)

Incidentally, would it be OK to sort and uniq uBXr0r15.txt before piping into awk, or do you need the original ordering/duplication?




回答2:


Though I love awk it is not necessary for this.

tr -d '"' < uBXr0r15.txt | date +%Y-%m-%d -f -




回答3:


 gawk 'BEGIN{
       m=split("January|February|March|April|May|June|July|August|September|October|November|December",d,"|")
       for(o=1;o<=m;o++){
          months[d[o]]=sprintf("%02d",o)
       }
       FS="[, ]"
    }
    {
      gsub(/["]/,"",$1)
      gsub(/["]/,"",$4)
      t=mktime($4" "months[$1]" "$2" 0 0 0")
      print strftime("%Y-%m-%d",t)
    }' uBXr0r15.txt

doing everything inside gawk will be faster than calling external commands.



来源:https://stackoverflow.com/questions/2391272/gawk-awk-piping-date-to-getline-sometimes-wont-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!