Parameter expansion slow for large data sets

后端 未结 2 1528
说谎
说谎 2021-01-16 02:14

If I take the first 1,000 bytes from a file, Bash can replace some characters pretty quick

$ cut -b-1000 get_video_i         


        
2条回答
  •  被撕碎了的回忆
    2021-01-16 02:39

    For the why, you can see the implementation of this code in pat_subst in subst.c in the bash source code.

    For each match in the string, the length of the string is counted numerous times (in pat_subst, match_pattern and match_upattern), both as a C string and more expensively as a multibyte string. This makes the function both slower than necessary, and more importantly, quadratic in complexity.

    This is why it's slow for larger input, and here's a pretty graph:

    Quadratic runtime in shell replacements

    As for workarounds, just use sed. It's more likely to be optimized for string replacement operations (though you should be aware that POSIX only guarantees 8192 bytes per line, even though GNU sed handles arbitrarily large ones).

提交回复
热议问题