Why don't all the shell processes in my promises (start blocks) run? (Is this a bug?)

问题

I want to run multiple shell processes, but when I try to run more than 63, they hang. When I reduce max_threads in the thread pool to n, it hangs after running the n^th shell command.

As you can see in the code below, the problem is not in start blocks per se, but in start blocks that contain the shell command:

#!/bin/env perl6
my $*SCHEDULER = ThreadPoolScheduler.new( max_threads => 2 );

my @processes;

# The Promises generated by this loop work as expected when awaited
for @*ARGS -> $item {
    @processes.append(
        start { say "Planning on processing $item" }
    );
}

# The nth Promise generated by the following loop hangs when awaited (where n = max_thread)
for @*ARGS -> $item {
    @processes.append(
        start { shell "echo 'processing $item'" }
    );
}
await(@processes);

Running ./process_items foo bar baz gives the following output, hanging after processing bar, which is just after the n^th (here 2^nd) thread has run using shell:

Planning on processing foo
Planning on processing bar
Planning on processing baz
processing foo
processing bar

What am I doing wrong? Or is this a bug?

Perl 6 distributions tested on CentOS 7:
  Rakudo Star 2018.06
  Rakudo Star 2018.10
  Rakudo Star 2019.03-RC2
  Rakudo Star 2019.03

With Rakudo Star 2019.03-RC2, use v6.c versus use v6.d did not make any difference.

回答1:

The shell and run subs use Proc, which is implemented in terms of Proc::Async. This uses the thread pool internally. By filling up the pool with blocking calls to shell, the thread pool becomes exhausted, and so cannot process events, resulting in the hang.

It would be far better to use Proc::Async directly for this task. The approach with using shell and a load of real threads won't scale well; every OS thread has memory overhead, GC overhead, and so forth. Since spawning a bunch of child processes is not CPU-bound, this is rather wasteful; in reality, just one or two real threads are needed. So, in this case, perhaps the implementation pushing back on you when doing something inefficient isn't the worst thing.

I notice that one of the reasons for using shell and the thread pool is to try and limit the number of concurrent processes. But this isn't a very reliable way to do it; just because the current thread pool implementation sets a default maximum of 64 threads does not mean it always will do so.

Here's an example of a parallel test runner that runs up to 4 processes at once, collects their output, and envelopes it. It's a little more than you perhaps need, but it nicely illustrates the shape of the overall solution:

my $degree = 4;
my @tests = dir('t').grep(/\.t$/);
react {
    sub run-one {
        my $test = @tests.shift // return;
        my $proc = Proc::Async.new('perl6', '-Ilib', $test);
        my @output = "FILE: $test";
        whenever $proc.stdout.lines {
            push @output, "OUT: $_";
        }
        whenever $proc.stderr.lines {
            push @output, "ERR: $_";
        }
        my $finished = $proc.start;
        whenever $finished {
            push @output, "EXIT: {.exitcode}";
            say @output.join("\n");
            run-one();
        }
    }
    run-one for 1..$degree;
}

The key thing here is the call to run-one when a process ends, which means that you always replace an exited process with a new one, maintaining - so long as there are things to do - up to 4 processes running at a time. The react block naturally ends when all processes have completed, due to the fact that the number of events subscribed to drops to zero.

来源：https://stackoverflow.com/questions/55265176/why-dont-all-the-shell-processes-in-my-promises-start-blocks-run-is-this-a

标签

raku