程序版本
Pulsar:2.8.0
amqp:2.8.0.1
问题描述
Pulsar支持通过RabbitMQ的客户端发送消息,但是实际使用过程中出现如下异常:
channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=Unknown exchange:ex02, class-id=0, method-id=0)
详细日志如下
Exception in thread "main" java.io.IOException
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:129)
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:125)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:147)
at com.rabbitmq.client.impl.ChannelN.exchangeDeclare(ChannelN.java:783)
at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.exchangeDeclare(AutorecoveringChannel.java:242)
at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.exchangeDeclare(AutorecoveringChannel.java:249)
at AMQPClient.run(AMQPClient.java:32)
at AMQPClient.main(AMQPClient.java:15)
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=Unknown exchange:ex02, class-id=0, method-id=0)
at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66)
at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)
at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:502)
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:293)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:141)
... 5 more
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=Unknown exchange:ex02, class-id=0, method-id=0)
at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:522)
at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:346)
at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:182)
at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:114)
at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:719)
at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:48)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:646)
at java.lang.Thread.run(Thread.java:748)
Process finished with exit code 130
问题复现流程
- 在Pulsar中配置AMQP的协议转换器。
- 启动Pulsar。
- 启动RabbitMQ客户端,出现如上错误,后来发现Pulsar上没有创建需要的命名空间。
- 在Pulsar中创建命名空间,仍然报错。
问题排查
查看Pulsar的日志,如下:
10:52:19.978 [pulsar-27-2] INFO org.apache.pulsar.broker.namespace.OwnershipCache - Trying to acquire ownership of public/vhost5/0x00000000_0xffffffff
10:52:20.021 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - Successfully acquired ownership of /namespace/public/vhost5/0x00000000_0xffffffff
10:52:20.021 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO io.streamnative.pulsar.handlers.amqp.ConnectionContainer - ConnectionContainer [onLoad] namespaceBundle: public/vhost5/0x00000000_0xffffffff, brokerPort: 6650
10:52:20.021 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO io.streamnative.pulsar.handlers.amqp.AmqpTopicManager - Find getBrokerServiceUrl Optional[LookupResult [type=BrokerUrl, lookupData=LookupData{brokerUrl=pulsar://172.20.140.23:6650, brokerUrlTls=null, httpUrl=http://172.20.140.23:9080, httpUrlTls=null}]], return Topic: persistent://public/vhost5/__amqp_exchange__ex02
10:52:20.024 [ForkJoinPool.commonPool-worker-5] INFO org.apache.pulsar.broker.cache.LocalZooKeeperCacheService - Successfully created local policies for /admin/local-policies/public/vhost5 -- Optional.empty
10:52:20.024 [pulsar-ordered-OrderedExecutor-7-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/vhost5/persistent/__amqp_exchange__ex02
10:52:20.025 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - Creating '/managed-ledgers/public/vhost5/persistent/__amqp_exchange__ex02'
10:52:20.063 [metadata-store-30-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/vhost5/persistent/__amqp_exchange__ex02] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 118, 104, 111, 115, 116, 53, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 95, 95, 97, 109, 113, 112, 95, 101, 120, 99, 104, 97, 110, 103, 101, 95, 95, 101, 120, 48, 50], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
10:52:20.079 [main-EventThread] INFO org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [127.0.0.1:3181] for ledger: 19
10:52:20.079 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/vhost5/persistent/__amqp_exchange__ex02] Created ledger 19
10:52:20.097 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [public/vhost5/persistent/__amqp_exchange__ex02] Successfully initialize managed ledger
10:52:20.098 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling for persistent://public/vhost5/__amqp_exchange__ex02
10:52:20.100 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/vhost5/__amqp_exchange__ex02] Error getting policies KeeperErrorCode = NoNode and isEncryptionRequired will be set to false
10:52:20.100 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/vhost5/__amqp_exchange__ex02] There are no replicated subscriptions on the topic
10:52:20.101 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] WARN org.apache.pulsar.broker.service.BrokerService - Replication or dedup check failed. Removing topic from topics list persistent://public/vhost5/__amqp_exchange__ex02, java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
10:52:20.101 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] ERROR io.streamnative.pulsar.handlers.amqp.AmqpTopicManager - Failed to getTopic persistent://public/vhost5/__amqp_exchange__ex02. exception: java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
10:52:20.101 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO io.streamnative.pulsar.handlers.amqp.ExchangeContainer - The exchange topic did not exist. namespacepublic/vhost5, exchangeName: ex02
10:52:26.284 [pulsar-io-29-2] ERROR io.streamnative.pulsar.handlers.amqp.AmqpConnection - [/172.21.34.67:52803] Got exception: readAddress(..) failed: Connection reset by peer
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
10:52:26.284 [pulsar-io-29-2] INFO io.streamnative.pulsar.handlers.amqp.AmqpConnection - close netty channel [id: 0x98ceec6e, L:/172.20.140.23:5672 - R:/172.21.34.67:52803]
可以看到有一条关键信息:
Error getting policies KeeperErrorCode = NoNode and isEncryptionRequired will be set to false
查看源码确定报错位置在PersistentTopic的构造方法里:
public PersistentTopic(String topic, ManagedLedger ledger, BrokerService brokerService) throws NamingException {
// 省略代码 ...
try {
Policies policies = brokerService.pulsar().getConfigurationCache().policiesCache()
.get(AdminResource.path(POLICIES, TopicName.get(topic).getNamespace()))
.orElseThrow(() -> new KeeperException.NoNodeException());
this.isEncryptionRequired = policies.encryption_required;
setSchemaCompatibilityStrategy(policies);
isAllowAutoUpdateSchema = policies.is_allow_auto_update_schema;
schemaValidationEnforced = policies.schema_validation_enforced;
if (policies.inactive_topic_policies != null) {
inactiveTopicPolicies = policies.inactive_topic_policies;
}
updateUnackedMessagesAppliedOnSubscription(policies);
updateUnackedMessagesExceededOnConsumer(policies);
} catch (Exception e) {
log.warn("[{}] Error getting policies {} and isEncryptionRequired will be set to false",
topic, e.getMessage());
isEncryptionRequired = false;
updateUnackedMessagesAppliedOnSubscription(null);
updateUnackedMessagesExceededOnConsumer(null);
}
// 省略代码 ...
}
这里面会从zk中查询命名空间的策略信息,第一次没有创建命名空间,所以zk上肯定没有相应的策略信息,所以报错是可以理解的。
但是第二次创建了命名空间,这里还是查不到,而且通过zk客户端查询zk数据时是存在的。
所以说明Pulsar在客户端第一次连接时,发现没有该命名空间,然后就创建了命名空间,但是没有创建策略信息。第二次连接时,虽然创建了策略信息,但是只是更新了zk里面的数据,没有更新zk缓存。
解决办法
这个问题应该是Pulsar的bug。
临时解决办法就是把策略信息加载到内存里,有两种方法:
- 删除命名空间,然后重新创建。
- 重启Pulsar,使得重新加载zk中的数据到缓存里。