问题
Assume a text file file
which contains multiple discrete number ranges, one per line. Each range is preceded by a string (i.e., the range name). The lower and upper bound of each range is separated by a dash. Each number range is succeeded by a semi-colon. The individual ranges are sorted (i.e., range 101-297 comes before 1299-1301) and do not overlap.
$cat file
foo 101-297;
bar 1299-1301;
baz 1314-5266;
Please note that in the example above the three ranges do not form a continuous range that starts at integer 1.
I believe that awk is the appropriate tool to fill the missing number ranges such that all ranges taken together form a continuous range from {1} to {upper bound of the last range}. If so, what awk command/function would you use to perform the task?
$cat file | sought_awk_command
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
new3 1302-1313;
baz 1314-5266;
--
Edit 1: Upon closer evaluation, the code suggested below fails at another simple example.
$cat example2
foo 101-297;
bar 1299-1301;
baz 1302-1314; # Notice that ranges "bar" and "baz" are continuous to one another
qux 1399-5266;
$ awk -F'[ -]' '$3-Q>1{print "new"++o,Q+1"-"$3-1";";Q=$4} 1' example2
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
baz 1302-1314;
new3 1302-1398; # ERROR HERE: Notice that range "new3" has a lower bound that is equal to upper bound of "bar", not of "baz".
qux 1399-5266;
--
Edit 2: Many thanks to RavinderSingh13 for assistance with solving this question. However, the suggested code still generates output inconsistent with the given objective.
$ cat example3
foo 35025-35144;
bar 35259-35375;
baz 35376-35624;
qux 37911-39434;
$ awk -F'[ -]' '$3-Q+0>=1{print "new"++o,Q+1"-"$3-1";";Q=$4} {Q=$4;print}' example3
new1 1-35024;
foo 35025-35144;
new2 35145-35258;
bar 35259-35375;
new3 35376-35375; # ERROR HERE: Notice that range "new3" has been added, even though ranges "bar" and "baz" are contiguous.
baz 35376-35624;
new4 35625-37910;
qux 37911-39434;
回答1:
This has no problem with ranges that can overlap as you showed in your original example2 where bar 1299-1301;
and baz 1301-1314;
overlapped at 1301
.
$ cat tst.awk
{ split($2,curr,/[-;]/); currStart=curr[1]; currEnd=curr[2] }
currStart > (prevEnd+1) { print "new"++cnt, prevEnd+1 "-" currStart-1 ";" }
{ print; prevEnd=currEnd }
$ awk -f tst.awk file
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
new3 1302-1313;
baz 1314-5266;
$ awk -f tst.awk example2
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
baz 1301-1314;
new3 1315-1398;
qux 1399-5266;
$ awk -f tst.awk example3
new1 1-35024;
foo 35025-35144;
new2 35145-35258;
bar 35259-35375;
baz 35376-35624;
new3 35625-37910;
qux 37911-39434;
回答2:
try:
awk -F'[ -]' '$3-Q>1{print "new"++o,Q+1"-"$3-1";";Q=$4} 1' Input_file
EDIT: Adding a non-one liner solution for same too now with proper explanation.
awk -F'[ -]' ' ###Setting field separator as space, dash here.
$3-Q>1{ ###Checking here if 3rd field and variable Qs subtraction is greater than 1, if yes then perform following.
print "new"++o,Q+1"-"$3-1";"; ###printing the string new with a incrementing value of variable o each time, then variable Qs value with adding 1 to it, then current line $4-1 and semi colon.
Q=$4 ###Assigning the variable Q value to 4th field of the current line here too.
}
1 ###printing the current line here.
' Input_file ###Mentioning the Input_file here too.
EDIT2: Adding one more answer as per OP's a condition.
awk -F'[ -]' '$3-Q+0>=1{print "new"++o,Q+1"-"$3-1";";Q=$4} {Q=$4;print}' Input_file
回答3:
$ cat file1
foo 2-100
bar 102-200
$ awk F' +|[-;}' 'p+1<$2{print "new" ++q, p+1 "-" $2-1 ";"}p=$3' file1
new1 1-1;
foo 2-100
new2 101-101;
bar 102-200
$ cat file2
foo 101-297;
bar 1299-1301;
baz 1314-5266;
$ awk -F' +|[-;]' 'p+1<$2{print "new" ++q, p+1 "-" $2-1 ";"}p=$3' file2
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
new3 1302-1313;
baz 1314-5266;
Explained:
$ awk -F' +|[-;]' ' # FS is ; - or a bunch of spaces
p+1 < $2 { # if p revious $3+1 is still less than new $2
print "new"++q,p+1 "-" $2-1 ";" # print a "new" line
}
p=$3 # set future p and implicit print of record *
' file2 # * as all values are above 0
来源:https://stackoverflow.com/questions/44763337/discrete-to-continuous-number-ranges-via-awk