Using a semaphore inside a nested Java 8 parallel stream action may DEADLOCK. Is this a bug?

前端 未结 3 908
名媛妹妹
名媛妹妹 2020-12-12 23:21

Consider the following situation: We are using a Java 8 parallel stream to perform a parallel forEach loop, e.g.,

IntStream.range(0,20).parallel().forEach(i         


        
相关标签:
3条回答
  • 2020-12-13 00:01

    After a bit of investigation of the source code of ForkJoinPool and ForkJoinTask, I assume that I found an answer:

    It is a bug (in my opinion), and the bug is in doInvoke() of ForkJoinTask. The problem is actually related to the nesting of the two loops and presumably not to the use of the Semaphore, however, one needs the Semaphore (or s.th. blocking in the outer loop) to make the problem become apparent and result in a deadlock (but I can imagine there are other issues implied by this bug - see Nested Java 8 parallel forEach loop perform poor. Is this behavior expected? ).

    The implementation of the doInvoke() method currently looks as follows:

    /**
     * Implementation for invoke, quietlyInvoke.
     *
     * @return status upon completion
     */
    private int doInvoke() {
        int s; Thread t; ForkJoinWorkerThread wt;
        return (s = doExec()) < 0 ? s :
            ((t = Thread.currentThread()) instanceof ForkJoinWorkerThread) ?
            (wt = (ForkJoinWorkerThread)t).pool.awaitJoin(wt.workQueue, this) :
            externalAwaitDone();
    }
    

    (and maybe also in doJoin which looks similar). In the line

            ((t = Thread.currentThread()) instanceof ForkJoinWorkerThread) ?
    

    it is tested if Thread.currentThread() is an instance of ForkJoinWorkerThread. The reason of this test is to check if the ForkJoinTask is running on a worker thread of the pool or the main thread. I believe that this line is OK for a non-nested parallel for, where it allows to distinguish if the current tasks runs on the main thread or on a pool worker. However, for tasks of the inner loop this test is problematic: Let us call the thread who runs the parallel().forEeach the creator thread. For the outer loop the creator thread is the main thread and it is not an instanceof ForkJoinWorkerThread. However, for inner loops running from a ForkJoinWorkerThread the creator thread is an instanceof ForkJoinWorkerThread too. Hence, in this situation, the test ((t = Thread.currentThread()) instanceof ForkJoinWorkerThread) IS ALWAYS TRUE!

    Hence, we always call pool.awaitJoint(wt.workQueue).

    Now, note that we call awaitJoint on the FULL workQueue of that thread (I believe that this is an additional flaw). It appears as if we are not only joining the inner-loops tasks, but also the task(s) of the outer loop and we JOIN ALL THOSE tasks. Unfortunately, the outer task contains that Semaphore.

    To proof, that the bug is related to this, we may check a very simple workaround. I create a t = new Thread() which runs the inner loop, then perform t.start(); t.join();. Note that this will not introduce any additional parallelism (I am immediately joining). However, it will change the result of the instanceof ForkJoinWorkerThread test for the creator thread. (Note that task will still be submitted to the common pool). If that wrapper thread is created, the problem does not occur anymore - at least in my current test situation.

    I postet a full demo to http://svn.finmath.net/finmath%20experiments/trunk/src/net/finmath/experiments/concurrency/ForkJoinPoolTest.java

    In this test code the combination

    final boolean isUseSemaphore        = true;
    final boolean isUseInnerStream      = true;
    final boolean isWrappedInnerLoopThread  = false;
    

    will result in a deadlock, while the combination

    final boolean isUseSemaphore        = true;
    final boolean isUseInnerStream      = true;
    final boolean isWrappedInnerLoopThread  = true;
    

    (and actually all other combinations) will not.

    Update: Since many are pointing out that the use of the Semaphore is dangerous I tried to create a demo of the problem without Semaphore. Now, there is no more deadlock, but an - in my opinion - unexpected performance issue. I created a new post for that at Nested Java 8 parallel forEach loop perform poor. Is this behavior expected?. The demo code is here: http://svn.finmath.net/finmath%20experiments/trunk/src/net/finmath/experiments/concurrency/NestedParallelForEachTest.java

    0 讨论(0)
  • 2020-12-13 00:02

    I ran your test in a profiler (VisualVM) and I agree: Threads are waiting for the semaphore and on aWaitJoin() in the F/J Pool.

    This framework has serious problems where join() is concerned. I’ve been writing a critique about this framework for four years now. The basic join problem starts here.

    aWaitJoin() has similar problems. You can peruse the code yourself. When the framework gets to the bottom of the work deque it issues a wait(). What it all comes down to is this framework has no way of doing a context-switch.

    There is a way of getting this framework to create compensation threads for the threads that are stalled. You need to implement the ForkJoinPool.ManagedBlocker interface. How you can do this, I have no idea. You’re running a basic API with streams. You’re not implementing the Streams API and writing your own code.

    I stick to my comment, above: Once you turn over the parallelism to the API you relinquish your ability to control the inner workings of that parallel mechanism. There is no bug with the API (other than it is using a faulty framework for parallel operations.) The problem is that semaphores or any other method for controlling parallelism within the API are hazardous ideas.

    0 讨论(0)
  • 2020-12-13 00:22

    Any time you are decomposing a problem into tasks, where those tasks could be blocked on other tasks, and try and execute them in a finite thread pool, you are at risk for pool-induced deadlock. See Java Concurrency in Practice 8.1.

    This is unquestionably a bug -- in your code. You're filling up the FJ pool with tasks that are going to block waiting for the results of other tasks in the same pool. Sometimes you get lucky and things manage to not deadlock (just like not all lock-ordering errors result in deadlock all the time), but fundamentally you're skating on some very thin ice here.

    0 讨论(0)
提交回复
热议问题