Fill the missing dates using awk

被刻印的时光 ゝ 提交于 2020-08-24 04:14:29

问题


I have some missing dates in a file. e.g.

$cat ifile.txt

20060805
20060807
20060808
20060809
20060810
20060813
20060815
20060829
20060901
20060903
20060904
20060905
20070712
20070713
20070716
20070717

The dates are in the format YYYYMMDD. My intention is fill the missing dates in between the dates if they are missing maximum for 5 day e.g.

20060805
20060806   ---- This was missed
20060807
20060808
20060809
20060810
20060811  ----- This was missed
20060812  ----- This was missed
20060813
20060814  ----- This was missed
20060815  
20060829
20060830 ------ This was missed
20060831 ------ This was missed
20060901  
20060902 ------ This was missed
20060903
20060904
20060905
20070712
20070713
20070714 ----- This was missed
20070715 ----- This was missed
20070716
20070717

Other dates are not needed where there is a gap of more than 5 days. For example, I don't need to fill the dates between 20060815 and 20060829, because the gap between them is more than 5 days.

I am doing it in following ways, but don't get anything.

#!/bin/sh
awk BEGIN'{
          a[NR]=$1
          } {
          for(i=1; i<NR; i++)
          if ((a[NR+1]-a[NR]) <= 5)
             for (j=1; j<(a[NR+1]-a[NR]); j++)
             print a[j]
          }' ifile.txt

Desired output:

20060805
20060806 
20060807
20060808
20060809
20060810
20060811 
20060812 
20060813
20060814 
20060815  
20060829
20060830 
20060831 
20060901  
20060902 
20060903
20060904
20060905
20070712
20070713
20070714 
20070715 
20070716
20070717

回答1:


Could you please try following, written and tested with shown samples in GNU awk.

awk '
FNR==1{
  print
  prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
  next
}
{
  found=i=diff=""
  curr_time=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
  diff=(curr_time-prev)/86400
  if(diff>1){
    while(++i<=diff){ print strftime("%Y%m%d", prev+86400*i) }
    found=1
  }
  prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
}
!found
'  Input_file



回答2:


The following seems to work:

stringtodate() {
    echo "${1:0:4}-${1:4:2}-${1:6:2} 12:00:00"
}
datetoseconds() {
    LC_ALL=C date -d "$(stringtodate "$1")" +%s
}
secondstodate() {
    LC_ALL=C date -d "@$1" +%Y%m%d
}
outputdatesbetween() {
    local start=$1
    local stop=$2
    for ((i = $1; i < $2; i += 3600*24)); do
        secondstodate "$i"
    done
}
prev=
while IFS= read -r line; do
    now=$(datetoseconds "$line")
    if [[ -n "$prev" ]] &&
        ((
            now - prev > 3600 * 24 && 
            now - prev < 3600 * 24 * 5
        ))
    then
        outputdatesbetween "$((prev + 3600 * 24))" "$now"
    fi
    echo "$line"
    prev="$now"
done < 1

Tested on repl




回答3:


Here is a quick GNU awk script. We use GNU awk to make use of the time-functions mktime and strftime:

awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
            {t=mktime($1 " " $2 " " $3 " 0 0 0",1) }
            (t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i,1) }
            {print; p=t}' file

Using mktime we convert the time into the total seconds since 1970. The function strftime converts it back to the desired format. Be aware that we enable the UTC-flag in both functions to ensure that we do not end up with surprises around Daylight-Saving-Time. Furthermore, since we already make use of GNU awk, we can further use the FIELDWIDTHS to determine the field lengths.

note: If your awk does not support the UTC-flag in mktime and strftime, you can run the following:

TZ=UTC awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
                  {t=mktime($1 " " $2 " " $3 " 0 0 0") }
                  (t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i) }
                  {print; p=t}' file


来源:https://stackoverflow.com/questions/62752730/fill-the-missing-dates-using-awk

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!