问题
I have some missing dates in a file. e.g.
$cat ifile.txt
20060805
20060807
20060808
20060809
20060810
20060813
20060815
20060829
20060901
20060903
20060904
20060905
20070712
20070713
20070716
20070717
The dates are in the format YYYYMMDD. My intention is fill the missing dates in between the dates if they are missing maximum for 5 day e.g.
20060805
20060806 ---- This was missed
20060807
20060808
20060809
20060810
20060811 ----- This was missed
20060812 ----- This was missed
20060813
20060814 ----- This was missed
20060815
20060829
20060830 ------ This was missed
20060831 ------ This was missed
20060901
20060902 ------ This was missed
20060903
20060904
20060905
20070712
20070713
20070714 ----- This was missed
20070715 ----- This was missed
20070716
20070717
Other dates are not needed where there is a gap of more than 5 days. For example, I don't need to fill the dates between 20060815 and 20060829, because the gap between them is more than 5 days.
I am doing it in following ways, but don't get anything.
#!/bin/sh
awk BEGIN'{
a[NR]=$1
} {
for(i=1; i<NR; i++)
if ((a[NR+1]-a[NR]) <= 5)
for (j=1; j<(a[NR+1]-a[NR]); j++)
print a[j]
}' ifile.txt
Desired output:
20060805
20060806
20060807
20060808
20060809
20060810
20060811
20060812
20060813
20060814
20060815
20060829
20060830
20060831
20060901
20060902
20060903
20060904
20060905
20070712
20070713
20070714
20070715
20070716
20070717
回答1:
Could you please try following, written and tested with shown samples in GNU awk
.
awk '
FNR==1{
print
prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
next
}
{
found=i=diff=""
curr_time=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
diff=(curr_time-prev)/86400
if(diff>1){
while(++i<=diff){ print strftime("%Y%m%d", prev+86400*i) }
found=1
}
prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
}
!found
' Input_file
回答2:
The following seems to work:
stringtodate() {
echo "${1:0:4}-${1:4:2}-${1:6:2} 12:00:00"
}
datetoseconds() {
LC_ALL=C date -d "$(stringtodate "$1")" +%s
}
secondstodate() {
LC_ALL=C date -d "@$1" +%Y%m%d
}
outputdatesbetween() {
local start=$1
local stop=$2
for ((i = $1; i < $2; i += 3600*24)); do
secondstodate "$i"
done
}
prev=
while IFS= read -r line; do
now=$(datetoseconds "$line")
if [[ -n "$prev" ]] &&
((
now - prev > 3600 * 24 &&
now - prev < 3600 * 24 * 5
))
then
outputdatesbetween "$((prev + 3600 * 24))" "$now"
fi
echo "$line"
prev="$now"
done < 1
Tested on repl
回答3:
Here is a quick GNU awk script. We use GNU awk to make use of the time-functions mktime
and strftime
:
awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
{t=mktime($1 " " $2 " " $3 " 0 0 0",1) }
(t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i,1) }
{print; p=t}' file
Using mktime
we convert the time into the total seconds since 1970. The function strftime
converts it back to the desired format. Be aware that we enable the UTC-flag in both functions to ensure that we do not end up with surprises around Daylight-Saving-Time. Furthermore, since we already make use of GNU awk, we can further use the FIELDWIDTHS
to determine the field lengths.
note: If your awk does not support the UTC-flag in mktime
and strftime
, you can run the following:
TZ=UTC awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
{t=mktime($1 " " $2 " " $3 " 0 0 0") }
(t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i) }
{print; p=t}' file
来源:https://stackoverflow.com/questions/62752730/fill-the-missing-dates-using-awk