问题
The following file has multiple columns with date, time and incomplete data set as shown using a simple file
# Matrix.txt
13.09.2016:23:44:10;;4.0
13.09.2016:23:44:20;10.0;
13.09.2016:23:44:30;;
13.09.2016:23:44:40;30.0;7.0
How can I do an linear interpolation on each column using awk to get the missing data:
# Output.txt
13.09.2016:23:44:10;0.0;4.0
13.09.2016:23:44:20;10.0;5.0
13.09.2016:23:44:30;20.0;6.0
13.09.2016:23:44:40;30.0;7.0
回答1:
Here is one solution in Gnu awk. It runs twice for the first given data file, remembers first and last data points (y1, y2) and their timestamps (x2, x2), computes slopes of the points (k=(y2-y1)/(x2-x1)) and inter- and extrapolates values for empty elements ((y=(x1-x)+y1).
It's not fool proof, it doesn't check for division by zeroes or if there are two points for the slopes or any other checks whatsoever.
$ cat inexpolator.awk
BEGIN {
FS=OFS=";"
ARGC=3; ARGV[2]=ARGV[1] # run it twice for first file
}
BEGINFILE { # on the second round
for(i in p) # compute the slopes
k[i]=(y2[i]-y1[i])/(x2[i]-x1[i])
}
{
split($1,a,"[:.]") # reformat the timestamp
ts=mktime(a[3] " " a[2] " " a[1] " " a[4] " " a[5] " " a[6])
}
NR==FNR { # remember first and last points for slopes
for(i=2;i<=NF;i++) {
p[i]
if(y1[i]=="") { y1[i]=$i; x1[i]=ts }
if($i!="") { y2[i]=$i; x2[i]=ts }
}
next # only on the first round
}
{ # reformat ts again for output
printf "%s", strftime("%d.%m.%Y:%H:%M:%S",ts) OFS # print ts
for(i=2;i<=NF;i++) {
if($i=="") $i=k[i]*(ts-x1[i])+y1[i] # compute missing points
printf "%.1f%s", $i, (i<NF?OFS:ORS) # print points
}
}
Run it:
$ awk -f inexpolator.awk Matrix.txt
13.09.2016:23:44:10;0.0;4.0
13.09.2016:23:44:20;10.0;5.0
13.09.2016:23:44:30;20.0;6.0
13.09.2016:23:44:40;30.0;7.0
来源:https://stackoverflow.com/questions/39792172/using-awk-to-interpolate-data-column-based-in-a-data-file-with-date-and-time