问题
I am currently experimenting with failure scenarios that might happen when communicating via the message broker RabbitMQ. The goal is to evaluate how such communication can be made more resilient.
In particular, I want to trigger a nack (not-acknowledge) confirm when sending messages in producer-commit mode.
To do so, I send a message to a non-existent exchange via Spring AMQP's RabbitTemplate.send
. In the callback provided via RabbitTemplate.setConfirmCallback
, I then handle ack=false
confirms by resending the message to an existing exchange (simulating that I took care of the nack cause).
A sample class and the related test are provided below, the complete sample project can be found in my github repository. I use RabbitMQ 3.6 and Spring Boot/AMQP 2.0.2.
When running the test, the callback is called with ack=false
as expected.
However, re-sending the message hangs while re-creating a channel (with a timeout exception after 10 minutes). A dump of the call stack and logs are provided below.
A solution to the problem seems to be to send the message in a different thread as proposed here.
If you uncomment the line service.runInSeparateThread = true;
in the test, things work!
However, I neither truely understand why things (don't) work nor did I read about this practice anywhere except for the above mentioned post. Is this expected behavior or a bug? Can someone explain the details?
Thanks a lot for your advice!
A call stack snapshot:
"AMQP Connection 127.0.0.1:5672@3968" prio=5 tid=0xe nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at com.rabbitmq.utility.BlockingCell.get(BlockingCell.java:73)
at com.rabbitmq.utility.BlockingCell.uninterruptibleGet(BlockingCell.java:120)
at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)
at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:494)
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:288)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:138)
at com.rabbitmq.client.impl.ChannelN.open(ChannelN.java:133)
at com.rabbitmq.client.impl.ChannelManager.createChannel(ChannelManager.java:176)
at com.rabbitmq.client.impl.AMQConnection.createChannel(AMQConnection.java:542)
at org.springframework.amqp.rabbit.connection.SimpleConnection.createChannel(SimpleConnection.java:57)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createBareChannel(CachingConnectionFactory.java:1156)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.access$200(CachingConnectionFactory.java:1144)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.doCreateBareChannel(CachingConnectionFactory.java:585)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createBareChannel(CachingConnectionFactory.java:568)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getCachedChannelProxy(CachingConnectionFactory.java:538)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getChannel(CachingConnectionFactory.java:520)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.access$1500(CachingConnectionFactory.java:94)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createChannel(CachingConnectionFactory.java:1161)
at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1803)
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1771)
at org.springframework.amqp.rabbit.core.RabbitTemplate.send(RabbitTemplate.java:859)
...
The logs:
...
10:21:24.613 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitAdmin - declaring Exchange 'ExistentExchange'
10:21:24.630 [main] INFO com.example.rabbitmq.ProducerService - sending `initial Message`
10:21:24.648 [main] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - Added listener org.springframework.amqp.rabbit.core.RabbitTemplate$MockitoMock$952329793@562c877a
10:21:24.648 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Added publisher confirm channel: Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1), conn: Proxy@3013909b Shared Rabbit Connection: SimpleConnection@12db3386 [delegate=amqp://guest@127.0.0.1:5672/, localPort= 1341] to map, size now 1
10:21:24.649 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Executing callback RabbitTemplate$$Lambda$175/1694519286 on RabbitMQ Channel: Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1), conn: Proxy@3013909b Shared Rabbit Connection: SimpleConnection@12db3386 [delegate=amqp://guest@127.0.0.1:5672/, localPort= 1341]
10:21:24.649 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Publishing message (Body:'[B@67001148(byte[15])' MessageProperties [headers={}, contentType=application/octet-stream, contentLength=0, deliveryMode=PERSISTENT, priority=0, deliveryTag=0])on exchange [nonExistentExchange], routingKey = [nonExistentQueue]
10:21:24.659 [main] INFO com.example.rabbitmq.ProducerService - done with sending message
10:21:24.675 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) PC:Nack:(close):1
10:21:24.677 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - Sending confirm PendingConfirm [correlationData=null cause=channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'nonExistentExchange' in vhost '/', class-id=60, method-id=40)]
10:21:24.677 [AMQP Connection 127.0.0.1:5672] INFO com.example.rabbitmq.ProducerService - In confirm callback, ack=false, cause=channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'nonExistentExchange' in vhost '/', class-id=60, method-id=40), correlationData=null
10:21:24.677 [AMQP Connection 127.0.0.1:5672] INFO com.example.rabbitmq.ProducerService - sending `resend Message`
10:21:24.678 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) PC:Nack:(close):1
10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - AMQChannel(amqp://guest@127.0.0.1:5672/,1) No listener for seq:1
10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Removed publisher confirm channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) from map, size now 0
10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Removed publisher confirm channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) from map, size now 0
10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - PendingConfirms cleared
ProducerService:
@Service
public class ProducerService {
static final String EXISTENT_EXCHANGE = "ExistentExchange";
private static final String NON_EXISTENT_EXCHANGE = "nonExistentExchange";
private static final String QUEUE_NAME = "nonExistentQueue";
private final Logger logger = LoggerFactory.getLogger(getClass());
private final RabbitTemplate rabbitTemplate;
private final Executor executor = Executors.newCachedThreadPool();
boolean runInSeparateThread = false;
public ProducerService(RabbitTemplate rabbitTemplate) {
this.rabbitTemplate = rabbitTemplate;
rabbitTemplate.setConfirmCallback(this::confirmCallback);
}
private void confirmCallback(CorrelationData correlationData, boolean ack, String cause) {
logger.info("In confirm callback, ack={}, cause={}, correlationData={}", ack, cause, correlationData);
if (!ack) {
if (runInSeparateThread) {
executor.execute(() -> sendMessage("resend Message", EXISTENT_EXCHANGE));
} else {
sendMessage("resend Message", EXISTENT_EXCHANGE);
}
} else {
logger.info("sending was acknowledged");
}
}
public void produceMessage() {
sendMessage("initial Message", NON_EXISTENT_EXCHANGE);
}
private void sendMessage(String messageBody, String exchangeName) {
logger.info("sending `{}`", messageBody);
rabbitTemplate.send(exchangeName, QUEUE_NAME, new Message(messageBody.getBytes(), new MessageProperties()));
logger.info("done with sending message");
}
}
ProducerServiceTest:
@RunWith(SpringRunner.class)
@ContextConfiguration(classes = {RabbitAutoConfiguration.class, ProducerService.class})
@DirtiesContext
public class ProducerServiceTest {
@Autowired
private ProducerService service;
@SpyBean
private RabbitTemplate rabbitTemplate;
@Autowired
private AmqpAdmin amqpAdmin;
@Autowired
private CachingConnectionFactory cachingConnectionFactory;
@Before
public void setup() {
cachingConnectionFactory.setPublisherConfirms(true);
amqpAdmin.declareExchange(new DirectExchange(ProducerService.EXISTENT_EXCHANGE));
}
@After
public void cleanup() {
amqpAdmin.deleteExchange(ProducerService.EXISTENT_EXCHANGE);
}
@Test
public void sendMessageToNonexistentExchange() throws InterruptedException {
final CountDownLatch sentMessagesLatch = new CountDownLatch(2);
final List<Message> sentMessages = new ArrayList<>();
doAnswer(invocation -> {
invocation.callRealMethod();
sentMessages.add(invocation.getArgument(2));
sentMessagesLatch.countDown();
return null;
}).when(rabbitTemplate).send(anyString(), anyString(), any(Message.class));
// service.runInSeparateThread = true;
service.produceMessage();
sentMessagesLatch.await();
List<String> messageBodies = sentMessages.stream().map(message -> new String(message.getBody())).collect(toList());
assertThat(messageBodies, equalTo(Arrays.asList("initial Message", "resend Message")));
}
}
回答1:
It could be considered a bug, I suppose, but it's an artifact of the way we cache channels to improve performance. The problem is that attempting to publish on a channel on the same thread that's delivering an ack for the same channel causes a deadlock in the client library.
We have an open issue to look into a solution (for a different reason); we just haven't gotten around to it. AFAIK, you are only the second user to hit this in more than 6 years since we added support for confirms and returns.
EDIT
Actually, this is a different situation; it's not reusing the channel since the channel is closed. It is trying to create a new channel and that is what is deadlocked. I don't see how we (Spring AMQP) can do anything; it's a limitation of the java client; you cannot perform operations on the ack thread.
来源:https://stackoverflow.com/questions/50580507/java-rabbitmq-client-hangs-on-resend-via-thread-of-producer-commit-callback-afte