When does command substitution spawn more subshells than the same commands in isolation?

前端 未结 2 1597
挽巷
挽巷 2020-12-01 03:53

Yesterday it was suggested to me that using command substitution in bash causes an unnecessary subshell to be spawned. The advice was specific to this use case:



        
相关标签:
2条回答
  • 2020-12-01 04:29

    Update and caveat:

    This answer has a troubled past in that I confidently claimed things that turned out not to be true. I believe it has value in its current form, but please help me eliminate other inaccuracies (or convince me that it should be deleted altogether).

    I've substantially revised - and mostly gutted - this answer after @kojiro pointed out that my testing methods were flawed (I originally used ps to look for child processes, but that's too slow to always detect them); a new testing method is described below.

    I originally claimed that not all bash subshells run in their own child process, but that turns out not to be true.

    As @kojiro states in his answer, some shells - other than bash - DO sometimes avoid creation of child processes for subshells, so, generally speaking in the world of shells, one should not assume that a subshell implies a child process.

    As for the OP's cases in bash (assumes that command{n} instances are simple commands):

    # Case #1
    command1         # NO subshell
    var=$(command1)  # 1 subshell (command substitution)
    
    # Case #2
    command1 | command2         # 2 subshells (1 for each pipeline segment)
    var=$(command1 | command2)  # 3 subshells: + 1 for command subst.
    
    # Case #3
    command1 | command2 ; var=$?         # 2 subshells (due to the pipeline)
    var=$(command1 | command2 ; echo $?) # 3 subshells: + 1 for command subst.;
                                         #   note that the extra command doesn't add 
                                         #   one
    

    It looks like using command substitution ($(...)) always adds an extra subshell in bash - as does enclosing any command in (...).

    I believe, but am not certain these results are correct; here's how I tested (bash 3.2.51 on OS X 10.9.1) - please tell me if this approach is flawed:

    • Made sure only 2 interactive bash shells were running: one to run the commands, the other to monitor.
    • In the 2nd shell I monitored the fork() calls in the 1st with sudo dtruss -t fork -f -p {pidOfShell1} (the -f is necessary to also trace fork() calls "transitively", i.e. to include those created by subshells themselves).
    • Used only the builtin : (no-op) in the test commands (to avoid muddling the picture with additional fork() calls for external executables); specifically:

      • :
      • $(:)
      • : | :
      • $(: | :)
      • : | :; :
      • $(: | :; :)
    • Only counted those dtruss output lines that contained a non-zero PID (as each child process also reports the fork() call that created it, but with PID 0).

    • Subtracted 1 from the resulting number, as running even just a builtin from an interactive shell apparently involves at least 1 fork().
    • Finally, assumed that the resulting count represents the number of subshells created.

    Below is what I still believe to be correct from my original post: when bash creates subshells.


    bash creates subshells in the following situations:

    • for an expression surrounded by parentheses ( (...) )
      • except directly inside [[ ... ]], where parentheses are only used for logical grouping.
    • for every segment of a pipeline (|), including the first one
      • Note that every subshell involved is a clone of the original shell in terms of content (process-wise, subshells can be forked from other subshells (before commands are executed)).
        Thus, modifications of subshells in earlier pipeline segments do not affect later ones.
        (By design, commands in a pipeline are launched simultaneously - sequencing only happens through their connected stdin/stdout pipes.)
      • bash 4.2+ has shell option lastpipe (OFF by default), which causes the last pipeline segment NOT to run in a subshell.
    • for command substitution ($(...))

    • for process substitution (<(...))

      • typically creates 2 subshells; in the case of a simple command, @konsolebox came up with a technique to only create 1: prepend the simple command with exec (<(exec ...)).
    • background execution (&)

    Combining these constructs will result in more than one subshell.

    0 讨论(0)
  • 2020-12-01 04:40

    In Bash, a subshell always executes in a new process space. You can verify this fairly trivially in Bash 4, which has the $BASHPID and $$ environment variables:

    • $$ Expands to the process ID of the shell. In a () subshell, it expands to the process ID of the current shell, not the subshell.
    • BASHPID Expands to the process id of the current bash process. This differs from $$ under certain circumstances, such as subshells that do not require bash to be re-initialized

    in practice:

    $ type echo
    echo is a shell builtin
    $ echo $$-$BASHPID
    4671-4671
    $ ( echo $$-$BASHPID )
    4671-4929
    $ echo $( echo $$-$BASHPID )
    4671-4930
    $ echo $$-$BASHPID | { read; echo $REPLY:$$-$BASHPID; }
    4671-5086:4671-5087
    $ var=$(echo $$-$BASHPID ); echo $var
    4671-5006
    

    About the only case where the shell can elide an extra subshell is when you pipe to an explicit subshell:

    $ echo $$-$BASHPID | ( read; echo $REPLY:$$-$BASHPID; )
    4671-5118:4671-5119
    

    Here, the subshell implied by the pipe is explicitly applied, but not duplicated.

    This varies from some other shells that try very hard to avoid fork-ing. Therefore, while I feel the argument made in js-shell-parse misleading, it is true that not all shells always fork for all subshells.

    0 讨论(0)
提交回复
热议问题