SelectorImpl is BLOCKED

问题

I use a lot of client sends a request to the server about 1000 requests per second a client, the server's CPU soon rose to 600% (8 cores), and always maintain this state. When I use jstack printing process content, I found SelectorImpl is BLOCKED state. Records are as follows:

nioEventLoopGroup-4-1 prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
    - locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
    - locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(Unknown Source)
    at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
    at java.lang.Thread.run(Unknown Source)

High CPU has something to do with this? Another problem is that when I connect a lot of clients, find some client will connect, an error is as follows:

"nioEventLoopGroup-4-1" prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Unknown Source)

Generate client is accomplished by using a thread pool, and has set up a connection timeout, but why frequent connection timeout? Is to serve the cause of the suit?

    public void run() {

    System.out.println(tnum + " connecting...");
    try {
        Bootstrap bootstrap = new Bootstrap();
        bootstrap.group(group)
        .channel(NioSocketChannel.class)
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 30000)
        .handler(loadClientInitializer);

        // Start the connection attempt.
        ChannelFuture future = bootstrap.connect(host, port);
        future.channel().attr(AttrNum).set(tnum);
        future.sync();
        if (future.isSuccess()) {
            System.out.println(tnum + " login success.");
            goSend(tnum, future.channel());
        } else {
            System.out.println(tnum + " login failed.");
        }
    } catch (Exception e) {
        XLog.error(e);
    } finally {

// group.shutdownGracefully(); }

回答1:

High CPU has something to do with this?

It might be. I'd diagnose this problem following way (on a Linux box):

Find threads which are eating CPU

Using pidstat I'd find which threads are eating CPU and in what mode (user/kernel) time is spent.

$ pidstat -p [java-process-pid] -tu 1 | awk '$9 > 50'

This command shows threads eating at least 50% of CPU time. You can inspect what those threads are doing using jstack, VisualVM or Java Flight Recorder.

If CPU-hungry threads and BLOCKED threads are the same, CPU usage seems to do something with contention.

Find reason for connection timeout

Basically you will get connection timeout if two OS'es can't finish TCP-handshake in a given time. Several reasons for this:

network link saturation. Can be diagnosed using sar -n DEV 1 and comparing rxkB/s and txkB/s columns to your link maximum throughput.
server (Netty) doesn't respond with accept() call in given timeout. This thread can be BLOCKED or starving for CPU time. You can find which threads are calling accept() (therefore finishing TCP-handshake) using strace -f -e trace=accept -p [java-pid]. And after that check for possible reasons using pidstat/jstack.

Also you can find number of received requests for connection open (but not confirmed) with netstat -an | grep -c SYN_RECV

回答2:

If you can elaborate more on what your Netty is doing it could be helpful. Regardless - please make sure you are closing the channels. Notice from the Channel javadoc:

It is important to call close() or close(ChannelPromise) to release all resources once you are done with the Channel. This ensures all resources are released in a proper way, i.e. filehandles

If you are closing the channels, then the problem may be with the logic it self - running into infinite loops or similar - which may be able to explain the high CPU.

来源：https://stackoverflow.com/questions/18120634/selectorimpl-is-blocked

标签

java

selector

blocked