pulsar之AutoRecovery功能

蓝咒 提交于 2020-07-29 04:04:12

pulsar支持应用无感知的扩展与迁移。
对broker,我们不论是升级还是扩展都非常简单,此处不做介绍。但是对于bookie,还是需要注意一些地方的。

autorecovery
关闭
bookkeeper shell autorecovery -disable
开启
bookkeeper shell autorecovery -enable
做迁移bookie的时候开启自动拷贝,会自动将关闭bookie的消息拷贝到新增的bookie上。

如何查看拷贝的ledger
显示bookkeeper的复制列表(此处可看出所有下架bookie的消息对否拷贝完全)
bookkeeper shell listunderreplicated
显示bookkeeper的未复制列表(对某台bookie而言)
bookkeeper shell listunderreplicated -missingreplica 172.16.4.224:3181
显示某个ledgerId的元数据信息
bookkeeper shell ledgermetadata -ledgerid 89

问题一
https://github.com/apache/bookkeeper/issues/2001
楼主碰到了这个bug。
现象是
13:34:36.437 [db-storage-cleanup-16-1] WARN org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage - Failed to cleanup db indexes
org.apache.bookkeeper.bookie.BookieNoEntryException:Entry−1notfoundin630856964063500820atorg.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.getLastEntryInLedgerInternal(EntryLocationIndex.java:123) [org.apache.bookkeeper−bookkeeper−server−4.9.0.jar:4.9.0]atorg.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.removeOffsetFromDeletedLedgers(EntryLocationIndex.java:219) [org.apache.bookkeeper−bookkeeper−server−4.9.0.jar:4.9.0]atorg.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.lambdaNoEntryException: Entry -1 not found in 630856964063500820 at org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.getLastEntryInLedgerInternal(EntryLocationIndex.java:123) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0] at org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.removeOffsetFromDeletedLedgers(EntryLocationIndex.java:219) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0] at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.lambdaNoEntryException:Entry−1notfoundin630856964063500820atorg.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.getLastEntryInLedgerInternal(EntryLocationIndex.java:123) [org.apache.bookkeeper−bookkeeper−server−4.9.0.jar:4.9.0]atorg.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.removeOffsetFromDeletedLedgers(EntryLocationIndex.java:219) [org.apache.bookkeeper−bookkeeper−server−4.9.0.jar:4.9.0]atorg.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.lambdacheckpoint7(SingleDirectoryDbLedgerStorage.java:624) [org.apache.bookkeeper−bookkeeper−server−4.9.0.jar:4.9.0]atjava.util.concurrent.Executors7(SingleDirectoryDbLedgerStorage.java:624) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0] at java.util.concurrent.Executors7(SingleDirectoryDbLedgerStorage.java:624) [org.apache.bookkeeper−bookkeeper−server−4.9.0.jar:4.9.0]atjava.util.concurrent.ExecutorsRunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access201(ScheduledThreadPoolExecutor.java:180)[?:1.8.0181]atjava.util.concurrent.ScheduledThreadPoolExecutor201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181] at java.util.concurrent.ScheduledThreadPoolExecutor201(ScheduledThreadPoolExecutor.java:180)[?:1.8.0 
1
​    
 81]atjava.util.concurrent.ScheduledThreadPoolExecutorScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
13:35:36.359 [db-storage-cleanup-16-1] INFO org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex - Deleting indexes for ledgers: [32768, 32771, 32774, 32777, 32780, 32783, 32786, 32789, 32792, 32795, 32798, 32801, 32804, 32807, 32810, 32813, 32816, 32819, 32822, 32825, 32828, 32831, 32834, 32837, 32840, 32843, 32846, 32849, 32852, 32855, 32858, 32861, 32864, 32867, 32870, 32873, 32876, 32879

暂未解决

问题二
以及可用bookie不足的错误;
12:19:53.378 [ReplicationWorker] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181, Bookie:172.16.4.222:3181], allBookies [Bookie:172.16.4.222:3181, Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181].
12:19:53.378 [ReplicationWorker] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to choose a bookie: excluded [Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181, Bookie:172.16.4.222:3181], fallback to choose bookie randomly from the cluster.
12:19:53.378 [ReplicationWorker] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181, Bookie:172.16.4.222:3181], allBookies [Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181, Bookie:172.16.4.222:3181].
12:19:53.378 [ReplicationWorker] WARN org.apache.bookkeeper.replication.ReplicationWorker - BKNotEnoughBookiesException while replicating the fragment
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandomInternal(RackawareEnsemblePlacementPolicyImpl.java:989) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandom(RackawareEnsemblePlacementPolicyImpl.java:907) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:797) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:200) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:757) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:221) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.replaceBookie(RackawareEnsemblePlacementPolicyImpl.java:659) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.replaceBookie(RackawareEnsemblePlacementPolicy.java:114) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.BookKeeperAdmin.getReplacementBookiesByIndexes(BookKeeperAdmin.java:997) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.BookKeeperAdmin.replicateLedgerFragment(BookKeeperAdmin.java:1045) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:296) [org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:249) [org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:210) [org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
经过询问pulsar大神sijie后,将bookie关闭AutoRecovery功能,再全部重启后错误不再抛出。如果有碰到的兄弟可以试试这个方法。(关闭bookie的时候注意,最好将producer关闭,要不然会造成消息的重复发送。2.4版本支持消息的事务功能,应该能解决此问题。)
————————————————
 

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!