Piping sometimes does not lead to immediate output

落爺英雄遲暮 提交于 2019-12-02 07:41:00

Every layer in a series of pipes can involve buffering; by default, tools that don't specify buffering behavior for stdout will use line buffering when outputting to a terminal, and block buffering when outputting anywhere else (including piping to another program or a file). In a chained pipe, all but the last stage will see their output as not going to the terminal, and will block buffer.

So in your case, tcpflow might be producing output constantly, and if it's doing so, tee should be producing data almost at the same rate. But grep is going to limit that flow to a trickle, and won't produce output until that trickle exceeds the size of the output buffer. It's already performed the filtering and called fwrite or puts or printf, but the data is waiting for enough bytes to build up behind it before sending it along to awk, to reduce the number of (expensive) system calls.

cat second.flow produces output immediately because as soon as cat finishes producing output, it exits, flushing and closing its stdout in the process, which cascades, when each step finds its stdin to be at EOF, it exits, flushing and closing its stdout. tcpflow isn't exiting, so the cascade of EOFs and flushing isn't happening.

For some programs, in the general case, you can change the buffering behavior by using stdbuf (or unbuffer, though that can't do line buffering to balance efficiency, and has issues with piped input). If the program is using internal buffering, this still might not work, but it's worth a shot.

In your specific case, though, since it's likely grep that's causing the interruption (by only producing a trickle of output that is sticking in the buffer, where tcpflow and tee are producing a torrent, and awk is connected to stdout and therefore line buffered by default), you can just adjust your command line to:

tcpflow -ec -i any port 8340 | tee second.flow | grep -i --line-buffered "\(</Manufacturer>\)\|\(</SerialNumber>\)" | awk -F'[<>]' '{print $3}'

At least for Linux's grep (not sure if switch is standard), that makes grep change its own output buffering to line-oriented buffering explicitly, which should remove the delay. If tcpflow itself is not producing enough output to flush regularly (you implied it did, but you could be wrong), you'd use stdbuf on it (but not tee, which, per stdbuf man page notes, manually changes its buffering, so stdbuf doesn't do anything) to make them line buffered:

stdbuf -oL tcpflow -ec -i any port 8340 | tee second.flow | grep -i --line-buffered "\(</Manufacturer>\)\|\(</SerialNumber>\)" | awk -F'[<>]' '{print $3}'

Update from comments: It looks like some flavors of awk block buffer prints to stdout, even when connected to a terminal. For mawk (the default on many Debian based distros), you can non-portably disable it by passing the -Winteractive switch at invocation. Alternatively, to work portably, you can just call system("") after each print, which portably forces output flushing on all implementations of awk. Sadly, the obvious fflush() is not portable to older implementations of awk, but if you only care about modern awk, just use fflush() to be obvious and mostly portable.

Reduce Buffering

Each application in the pipeline can do its own buffering. You may want to see if you can reduce buffering in tcpflow, as your other commands are line-oriented and unlikely to be the source of your buffering issue. I didn't see any specific options for buffer control in tcpflow, though the -b flag for max_bytes may help in circumstances where the text you want to work with is near the front of the flow.

You can also try modifying the buffering of tcpflow using stdbuf from GNU coreutils. This may help to reduce latency in your pipeline, but the man page provides the following caveats:

NOTE: If COMMAND adjusts the buffering of its standard streams ('tee' does for example) then that will override corresponding changes by 'stdbuf'. Also some filters (like 'dd' and 'cat' etc.) don't use streams for I/O, and are thus unaffected by 'stdbuf' settings.

As an example, the following may reduce output buffering of tcpflow:

  • stdbuf --output=0 tcpflow -ec -i any port 8340 # unbuffered output
  • stdbuf --output=L tcpflow -ec -i any port 8340 # line-buffered output

unless one of the caveats above apply. Your mileage may vary.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!