How can I detect a sequence of “hollows” (holes, lines not matching a pattern) bigger than n in a text file?

旧城冷巷雨未停 提交于 2019-12-24 03:12:18

问题


Case scenario:

$ cat Status.txt
1,connected
2,connected
3,connected
4,connected
5,connected
6,connected
7,disconnected
8,disconnected
9,disconnected
10,disconnected
11,disconnected
12,disconnected
13,disconnected
14,connected
15,connected
16,connected
17,disconnected
18,connected
19,connected
20,connected
21,disconnected
22,disconnected
23,disconnected
24,disconnected
25,disconnected
26,disconnected
27,disconnected
28,disconnected
29,disconnected
30,connected

As can be seen, there are "hollows", understanding them as lines with the "disconnected" value inside the sequence file.

I want, in fact, to detect these "holes", but it would be useful if I could set a minimum n of missing numbers in the sequence.
I.e: for ' n=5' a detectable hole would be the 7... 13 part, as there are at least 5 "disconnected" in a row on the sequence. However, the missing 17 should not be considered as detectable in this case. Again, at line 21 whe get a valid disconnection.

Something like:

$ detector Status.txt -n 5 --pattern connected
7
21

... that could be interpreted like:

- Missing more than 5 "connected" starting at 7.
- Missing more than 5 "connected" starting at 21.

I need to script this on Linux shell, so I was thinking about programing some loop, parsing strings and so on, but I feel like if this could be done by using linux shell tools and maybe some simpler programming. Is there a way?

Even when small programs like csvtool are a valid solution, some more common Linux commands (like grep, cut, awk, sed, wc... etc) could be worth for me when working with embedded devices.


回答1:


#!/usr/bin/env bash
last_connected=0
min_hole_size=${1:-5}  # default to 5, or take an argument from the command line
while IFS=, read -r num state; do
  if [[ $state = connected ]]; then
    if (( (num-last_connected) > (min_hole_size+1) )); then
      echo "Found a hole running from $((last_connected + 1)) to $((num - 1))"
    fi
    last_connected=$num
  fi
done

# Special case: Need to also handle a hole that's still open at EOF.
if [[ $state != connected ]] && (( num - last_connected > min_hole_size )); then
  echo "Found a hole running from $((last_connected + 1)) to $num"
fi

...emits, given your file on stdin (./detect-holes <in.txt):

Found a hole running from 7 to 13
Found a hole running from 21 to 29

See:

  • BashFAQ #1 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
  • The conditional expression -- the [[ ]] syntax used to make it safe to do string comparisons without quoting expansions.
  • Arithmetic comparison syntax -- valid in $(( )) in all POSIX-compliant shells; also available without the expansion side effects as (( )) as a bash extension.



回答2:


This is the perfect use case for awk, since the machinery of line reading, column splitting, and matching is all built in. The only tricky bit is getting the command line argument to your script, but it's not too bad:

#!/usr/bin/env bash
awk -v window="$1" -F, '
BEGIN { if (window=="") {window = 1} }

$2=="disconnected"{if (consecutive==0){start=NR}; consecutive++}
$2!="disconnected"{if (consecutive>window){print start}; consecutive=0}

END {if (consecutive>window){print start}}'

The window value is supplied as the first command line argument; left out, it defaults to 1, which means "display the start of gaps with at least two consecutive disconnections". Probably could have a better name. You can give it 0 to include single disconnections. Sample output below. (Note that I added series of 2 disconnections at the end to test the failure that Charles metions).

njv@organon:~/tmp$ ./tst.sh 0 < status.txt # any number of disconnections
7
17
21
31
njv@organon:~/tmp$ ./tst.sh < status.txt # at least 2 disconnections
7
21
31
njv@organon:~/tmp$ ./tst.sh 8 < status.txt # at least 9 disconnections
21



回答3:


Awk solution:

detector.awk script:

#!/bin/awk -f

BEGIN { FS="," }
$2 == "disconnected"{ 
    if (f && NR-c==nr) c++; 
    else { f=1; c++; nr=NR } 
}
$2 == "connected"{ 
    if (f) { 
        if (c > n) { 
            printf "- Missing more than 5 \042connected\042 starting at %d.\n", nr 
        } 
        f=c=0 
    } 
}

Usage:

awk -f detector.awk -v n=5 status.txt

The output:

- Missing more than 5 "connected" starting at 7.
- Missing more than 5 "connected" starting at 21.


来源:https://stackoverflow.com/questions/48490357/how-can-i-detect-a-sequence-of-hollows-holes-lines-not-matching-a-pattern-b

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!