Using awk to interpolate data column based in a data file with date and time

余生长醉 提交于 2019-12-04 02:58:16

问题


The following file has multiple columns with date, time and incomplete data set as shown using a simple file

# Matrix.txt
13.09.2016:23:44:10;;4.0
13.09.2016:23:44:20;10.0;
13.09.2016:23:44:30;;
13.09.2016:23:44:40;30.0;7.0

How can I do an linear interpolation on each column using awk to get the missing data:

# Output.txt
13.09.2016:23:44:10;0.0;4.0
13.09.2016:23:44:20;10.0;5.0
13.09.2016:23:44:30;20.0;6.0
13.09.2016:23:44:40;30.0;7.0

回答1:


Here is one solution in Gnu awk. It runs twice for the first given data file, remembers first and last data points (y1, y2) and their timestamps (x2, x2), computes slopes of the points (k=(y2-y1)/(x2-x1)) and inter- and extrapolates values for empty elements ((y=(x1-x)+y1).

It's not fool proof, it doesn't check for division by zeroes or if there are two points for the slopes or any other checks whatsoever.

$ cat inexpolator.awk
BEGIN {
    FS=OFS=";"
    ARGC=3; ARGV[2]=ARGV[1]        # run it twice for first file
}
BEGINFILE {                        # on the second round
        for(i in p)                # compute the slopes
            k[i]=(y2[i]-y1[i])/(x2[i]-x1[i])
}
{
    split($1,a,"[:.]")             # reformat the timestamp
    ts=mktime(a[3] " " a[2] " " a[1] " " a[4] " " a[5] " " a[6])
}
NR==FNR {                          # remember first and last points for slopes
    for(i=2;i<=NF;i++) {
        p[i]
        if(y1[i]=="") { y1[i]=$i; x1[i]=ts }
        if($i!="") { y2[i]=$i; x2[i]=ts }
    }
    next                           # only on the first round
}
{                                  # reformat ts again for output
    printf "%s", strftime("%d.%m.%Y:%H:%M:%S",ts) OFS  # print ts
    for(i=2;i<=NF;i++) {
        if($i=="") $i=k[i]*(ts-x1[i])+y1[i]            # compute missing points
        printf "%.1f%s", $i, (i<NF?OFS:ORS)            # print points
    }
}

Run it:

$ awk -f inexpolator.awk Matrix.txt
13.09.2016:23:44:10;0.0;4.0
13.09.2016:23:44:20;10.0;5.0
13.09.2016:23:44:30;20.0;6.0
13.09.2016:23:44:40;30.0;7.0


来源:https://stackoverflow.com/questions/39792172/using-awk-to-interpolate-data-column-based-in-a-data-file-with-date-and-time

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!