问题
Originating from this question gnuplot why warning: Bad time format in string,
there was the finding that the week numbers in gnuplot using the time specifiers %W
and %U
are wrong in some cases.
Apparently, there are different definitions of the week numbers. Furthermore, there are different definitions when a week starts, e.g. on Sunday or Monday. One definition for week numbers, which is commonly used (however, not in the US and a few other countries) is according to ISO 8601.
Code: (to illustrate wrong week numbers)
### wrong week numbering in gnuplot with %W and %U
reset session
StartDate = "24.12.2020"
myTimeFmt = "%d.%m.%Y"
SecondsPerDay = 3600*24
print " date %a %w %d %j %W %U"
print "===================================="
do for [i=0:20] {
t = strptime(myTimeFmt,StartDate) + i*SecondsPerDay
myDate = strftime(myTimeFmt." %a %w %d %j %W %U", t)
print sprintf("%s", myDate)
}
### end of code
gnuplot time specifiers:
%a abbreviated name of day of the week
%w day of the week, 0–6 (Sunday = 0)
%d day of the month, 01–31
%j day of the year, 1–366
%W week of the year (week starts on Monday)
%U week of the year (week starts on Sunday)
Result:
date %a %w %d %j %W %U
====================================
24.12.2020 Thu 04 24 359 52 52
25.12.2020 Fri 05 25 360 52 52
26.12.2020 Sat 06 26 361 52 52
27.12.2020 Sun 00 27 362 52 53
28.12.2020 Mon 01 28 363 53 53
29.12.2020 Tue 02 29 364 53 53
30.12.2020 Wed 03 30 365 53 53
31.12.2020 Thu 04 31 366 53 53
01.01.2021 Fri 05 01 001 01 01 ???
02.01.2021 Sat 06 02 002 01 01 ???
03.01.2021 Sun 00 03 003 00 01 ???
04.01.2021 Mon 01 04 004 01 01
05.01.2021 Tue 02 05 005 01 01
06.01.2021 Wed 03 06 006 01 01
07.01.2021 Thu 04 07 007 01 01
08.01.2021 Fri 05 08 008 01 01
09.01.2021 Sat 06 09 009 01 01
10.01.2021 Sun 00 10 010 01 02
11.01.2021 Mon 01 11 011 02 02
12.01.2021 Tue 02 12 012 02 02
13.01.2021 Wed 03 13 013 02 02
Question: Is there a workaround to fix this?
回答1:
Based on the description here: https://en.wikipedia.org/wiki/ISO_week_date, I guess the essence of the ISO 8601 definition is:
- a week starts on Monday
- week 01 is the week with the first Thursday of the year
- a week belongs to the year in which the majority of its days is in
- years starting or ending on Thurdays have 53 weeks, others have 52 weeks
Code:
### correct week number according to ISO 8601
reset session
dow(t) = int(tm_wday(t)) ? tm_wday(t) : 7 # day of week 1=Mon, ..., 7=Sun
week(t) = int((11 + tm_yday(t) - dow(t))/7) # "raw"week of year
wday(d,m,y) = tm_wday(strptime("%d.%m.%Y",sprintf("%02d.%02d.%04d",d,m,y))) # week day of certain date
wpy(y) = wday(1,1,y)==4 || wday(31,12,y)==4 ? 53 : 52 # weeks per year
woy(t) = week(t) < 1 ? wpy(tm_year(t)-1) : \
week(t) > wpy(tm_year(t)) ? 1 : week(t) # week of year
yow(t) = int(week(t) < 1 ? tm_year(t)-1 : week(t) > wpy(tm_year(t)) ? \
tm_year(t)+1 : tm_year(t)) # year of week (could be previous, current or next)
StartDate = "24.12.2020"
myTimeFmt = "%d.%m.%Y"
SecondsPerDay = 3600*24
print " date %a DoW %d %j YoW WoY"
print "======================================"
do for [i=0:20] {
t = strptime(myTimeFmt,StartDate) + i*SecondsPerDay
myDate = strftime(myTimeFmt." %a", t)
myDate2 = strftime("%d %j", t)
print sprintf("%s %02d %s %04d-W%02d", myDate, dow(t), myDate2, yow(t), woy(t))
}
### end of code
Result:
date %a DoW %d %j YoW WoY
======================================
24.12.2020 Thu 04 24 359 2020-W52
25.12.2020 Fri 05 25 360 2020-W52
26.12.2020 Sat 06 26 361 2020-W52
27.12.2020 Sun 07 27 362 2020-W52
28.12.2020 Mon 01 28 363 2020-W53
29.12.2020 Tue 02 29 364 2020-W53
30.12.2020 Wed 03 30 365 2020-W53
31.12.2020 Thu 04 31 366 2020-W53
01.01.2021 Fri 05 01 001 2020-W53
02.01.2021 Sat 06 02 002 2020-W53
03.01.2021 Sun 07 03 003 2020-W53
04.01.2021 Mon 01 04 004 2021-W01
05.01.2021 Tue 02 05 005 2021-W01
06.01.2021 Wed 03 06 006 2021-W01
07.01.2021 Thu 04 07 007 2021-W01
08.01.2021 Fri 05 08 008 2021-W01
09.01.2021 Sat 06 09 009 2021-W01
10.01.2021 Sun 07 10 010 2021-W01
11.01.2021 Mon 01 11 011 2021-W02
12.01.2021 Tue 02 12 012 2021-W02
13.01.2021 Wed 03 13 013 2021-W02
In order to use the week numbers, e.g. as time axis labels it would be ideal to have this implemented for %W
. Accidentially, there was a recent bug report on SourceForge.
So, I assume it will be fixed pretty soon in one of the next versions.
回答2:
Given the ongoing pandemic and the consequent interest in plotting epidemiological data from all sources, it seemed expedient to clean up and extend gnuplot's support for week-date formats. The "New Features" section of the gnuplot documentation now lists:
• Time specifier format %W has been brought into accord with the ISO 8601 week date standard.
• Time specifier format %U has been brought into accord with the CDC/MMWR week date standard.
• New function tm week(time, std) returns ISO or CDC standard week of year.
• New function weekdate iso(year, week, day) converts ISO standard week date to calendar time.
• New function weekdate cdc(year, week, day) converts CDC standard week date to calendar time.
Here is an example (from the online demo set) of converting data given in ISO 8601 week-date format into standard calendar dates for plotting along a gnuplot time axis.
# Epidemiological data
#
# Plot from data file that encodes date as an ISO 8601 "week date".
# Example: week date 2004-W01-1 is calendar date 29 December 2003
# The data is from the European Centre for Disease Prevention and Control
# https://www.ecdc.europa.eu/
# The ECDC data file uses fields containing week date as "YYYY-WW".
# First we define a function that extracts the integer year and week
# from this string and converts it to standard time representation.
calendar(date) = weekdate_iso( int(date[1:4]), int(date[6:7]) )
set datafile separator comma
set style data lines
set key Left left reverse box samplen 2 width 2
set grid x lt 1 lw .75 lc "gray"
set tics nomirror
set border 3
set xtics time format "%b\n%Y"
set ytics format " %4.0f"
data1 = '< grep "Denmark.*cases" ECDC-weekly-national-COVID.csv'
data2 = '< grep "Sweden.*cases" ECDC-weekly-national-COVID.csv'
data3 = '< grep "Norway.*cases" ECDC-weekly-national-COVID.csv'
data4 = '< grep "Finland.*cases" ECDC-weekly-national-COVID.csv'
data5 = '< grep "Iceland.*cases" ECDC-weekly-national-COVID.csv'
set title "weekly COVID-19 cases per 100,000 people" font "/Bold,15"
plot data1 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 title "Denmark", \
data2 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 title "Sweden", \
data3 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 title "Norway", \
data4 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 title "Finland", \
data5 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 lt 6 title "Iceland"
来源:https://stackoverflow.com/questions/65577669/gnuplot-how-to-get-correct-week-numbers