Bash while loop + cut slow

淺唱寂寞╮ 提交于 2019-12-02 08:41:16

问题


I am trying to process a file (1.5GB) with a bash loop to iterate each line. I used cut for its simplicity (relative) and ended up with:

while read line
do
    echo "$(echo $line | cut -d' ' -f 2-3)" "$(echo $line | cut -d'"' -f 20)"
done < TEST.log > IDS.log

This is very slow and only does about 2KB/sec. I need something to run a lot faster.

Also, what is the bottleneck here?


回答1:


The bottleneck is likely that you spawn several processes for every line of data. As for a replacement, this awk should be equivalent:

awk '{ split($0, a, "\""); print $2, $3, a[20] }' TEST.log > IDS.log



回答2:


Perl is usually very fast:

perl -nE 'say join " ", (split " ")[1,2], (split /"/)[19]' TEST.log > IDS.log

Perl arrays are indexed starting with 0.




回答3:


The biggest bottleneck here is spinning off the subprocesses for your pipelines. You can get a substantial (read: orders-of-magnitude) performance improvement just by getting rid of the command substitutions and pipelines.

while IFS=$'\x01' read -r ss1 ss2 ss3 _ <&3 && \
      IFS='"' read -r -a quote_separated_fields; do
    printf '%s\n' "${ss2} ${ss3} ${quote_separated_fields[20]}"
done < TEST.log 3< <(tr ' ' $'\x01' <TEST.log) > IDS.log

How does this work?

  • tr ' ' $'\x01' changes spaces in the input to a low-ASCII character to avoid special-case handling (where read will coalesce runs of whitespace into a single character). Putting this after 3< <(...) puts the output of this operation on file descriptor #3.
  • IFS=$'\x01' read -r ss1 ss2 ss3 _ <&3 splits a line on those characters, putting the first field into ss1 (which we don't care about), the second into ss2, the third into ss3, and the remainder of the line into _. The <&3 causes this line to read from file descriptor 3.
  • IFS='"' read -r -a quote_separated_fields splits input on stdin (FD 0) on " characters into an array called quote_separated_fields.


来源:https://stackoverflow.com/questions/28696604/bash-while-loop-cut-slow

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!