上一篇博客的最后简单提了下CommitLog的刷盘 【RocketMQ中Broker的消息存储源码分析】 (这篇博客和上一篇有很大的联系)
Broker的CommitLog刷盘会启动一个线程,不停地将缓冲区的内容写入磁盘(CommitLog文件)中,主要分为异步刷盘和同步刷盘
异步刷盘又可以分为两种方式:
①缓存到mappedByteBuffer -> 写入磁盘(包括同步刷盘)
②缓存到writeBuffer -> 缓存到fileChannel -> 写入磁盘 (前面说过的开启内存字节缓冲区情况下)
CommitLog的两种刷盘模式:
1 public enum FlushDiskType {
2 SYNC_FLUSH,
3 ASYNC_FLUSH
4 }
同步和异步,同步刷盘由GroupCommitService实现,异步刷盘由FlushRealTimeService实现,默认采用异步刷盘
在采用异步刷盘的模式下,若是开启内存字节缓冲区,那么会在FlushRealTimeService的基础上开启CommitRealTimeService
同步刷盘:
启动GroupCommitService线程:
1 public void run() {
2 CommitLog.log.info(this.getServiceName() + " service started");
3
4 while (!this.isStopped()) {
5 try {
6 this.waitForRunning(10);
7 this.doCommit();
8 } catch (Exception e) {
9 CommitLog.log.warn(this.getServiceName() + " service has exception. ", e);
10 }
11 }
12
13 // Under normal circumstances shutdown, wait for the arrival of the
14 // request, and then flush
15 try {
16 Thread.sleep(10);
17 } catch (InterruptedException e) {
18 CommitLog.log.warn("GroupCommitService Exception, ", e);
19 }
20
21 synchronized (this) {
22 this.swapRequests();
23 }
24
25 this.doCommit();
26
27 CommitLog.log.info(this.getServiceName() + " service end");
28 }
通过循环中的doCommit不断地进行刷盘
doCommit方法:
1 private void doCommit() {
2 synchronized (this.requestsRead) {
3 if (!this.requestsRead.isEmpty()) {
4 for (GroupCommitRequest req : this.requestsRead) {
5 // There may be a message in the next file, so a maximum of
6 // two times the flush
7 boolean flushOK = false;
8 for (int i = 0; i < 2 && !flushOK; i++) {
9 flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();
10
11 if (!flushOK) {
12 CommitLog.this.mappedFileQueue.flush(0);
13 }
14 }
15
16 req.wakeupCustomer(flushOK);
17 }
18
19 long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
20 if (storeTimestamp > 0) {
21 CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
22 }
23
24 this.requestsRead.clear();
25 } else {
26 // Because of individual messages is set to not sync flush, it
27 // will come to this process
28 CommitLog.this.mappedFileQueue.flush(0);
29 }
30 }
31 }
其中在GroupCommitService中管理着两张List:
1 private volatile List<GroupCommitRequest> requestsWrite = new ArrayList<GroupCommitRequest>();
2 private volatile List<GroupCommitRequest> requestsRead = new ArrayList<GroupCommitRequest>();
GroupCommitRequest中封装了一个Offset
1 private final long nextOffset;
这里就需要看到上一篇博客结尾提到的handleDiskFlush方法:
1 public void handleDiskFlush(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
2 // Synchronization flush
3 if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
4 final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
5 if (messageExt.isWaitStoreMsgOK()) {
6 GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
7 service.putRequest(request);
8 boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
9 if (!flushOK) {
10 log.error("do groupcommit, wait for flush failed, topic: " + messageExt.getTopic() + " tags: " + messageExt.getTags()
11 + " client address: " + messageExt.getBornHostString());
12 putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_DISK_TIMEOUT);
13 }
14 } else {
15 service.wakeup();
16 }
17 }
18 // Asynchronous flush
19 else {
20 if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
21 flushCommitLogService.wakeup();
22 } else {
23 commitLogService.wakeup();
24 }
25 }
26 }
这个方法的调用发生在Broker接收到来自Producer的消息,并且完成了向ByteBuffer的写入
可以看到,在同步刷盘SYNC_FLUSH模式下,会从AppendMessageResult 中取出WroteOffset以及WroteBytes从而计算出nextOffset,把这个nextOffset封装到GroupCommitRequest中,然后通过GroupCommitService 的putRequest方法,将GroupCommitRequest添加到requestsWrite这个List中
putRequest方法:
1 public synchronized void putRequest(final GroupCommitRequest request) {
2 synchronized (this.requestsWrite) {
3 this.requestsWrite.add(request);
4 }
5 if (hasNotified.compareAndSet(false, true)) {
6 waitPoint.countDown(); // notify
7 }
8 }
在完成List的add操作后,会通过CAS操作修改hasNotified这个原子化的Boolean值,同时通过waitPoint的countDown进行唤醒操作,在后面会有用
由于这里这里是同步刷盘,所以需要通过GroupCommitRequest的waitForFlush方法,在超时时间内等待该记录对应的刷盘完成
而异步刷盘会通过wakeup方法唤醒刷盘任务,并没有进行等待,这就是二者区别
回到doCommit方法中,这时会发现这里是对requestsRead这条List进行的操作,而刚才是将记录存放在requestsWrite这条List中的
这就和在run方法中的waitForRunning方法有关了:
1 protected void waitForRunning(long interval) {
2 if (hasNotified.compareAndSet(true, false)) {
3 this.onWaitEnd();
4 return;
5 }
6
7 //entry to wait
8 waitPoint.reset();
9
10 try {
11 waitPoint.await(interval, TimeUnit.MILLISECONDS);
12 } catch (InterruptedException e) {
13 log.error("Interrupted", e);
14 } finally {
15 hasNotified.set(false);
16 this.onWaitEnd();
17 }
18 }
这里通过CAS操作修改hasNotified值,从而调用onWaitEnd方法;如果修改失败,则因为await进入阻塞,等待上面所说的putRequest方法将其唤醒,也就是说当Producer发送的消息被缓存成功后,调用handleDiskFlush方法后,唤醒刷盘线工作,当然刷盘线程在达到超时时间interval后也会唤醒
再来看看onWaitEnd方法:
1 protected void onWaitEnd() {
2 this.swapRequests();
3 }
4
5 private void swapRequests() {
6 List<GroupCommitRequest> tmp = this.requestsWrite;
7 this.requestsWrite = this.requestsRead;
8 this.requestsRead = tmp;
9 }
可以看到,这里是将两个List进行了交换
这是一个非常有趣的做法,如果熟悉JVM的话,有没有觉得这其实很像新生代的复制算法!
当刷盘线程阻塞的时候,requestsWrite中会填充记录,当刷盘线程被唤醒工作的时候,首先会将requestsWrite和requestsRead进行交换,那么此时的记录就是从requestsRead中读取的了,而同时requestsWrite会变为空的List,消息记录就会往这个空的List中填充,如此往复
可以看到doCommit方法中,当requestsRead不为空的时候,在最后会调用requestsRead的clear方法,由此证明了我上面的说法
仔细来看看是如何进行刷盘的:
1 for (GroupCommitRequest req : this.requestsRead) {
2 // There may be a message in the next file, so a maximum of
3 // two times the flush
4 boolean flushOK = false;
5 for (int i = 0; i < 2 && !flushOK; i++) {
6 flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();
7
8 if (!flushOK) {
9 CommitLog.this.mappedFileQueue.flush(0);
10 }
11 }
12
13 req.wakeupCustomer(flushOK);
14 }
通过遍历requestsRead,可以到得到GroupCommitRequest封装的NextOffset
其中flushedWhere是用来记录上一次刷盘完成后的offset,若是上一次的刷盘位置大于等于NextOffset,就说明从NextOffset位置起始已经被刷新过了,不需要刷新,否则调用mappedFileQueue的flush方法进行刷盘
MappedFileQueue的flush方法:
1 public boolean flush(final int flushLeastPages) {
2 boolean result = true;
3 MappedFile mappedFile = this.findMappedFileByOffset(this.flushedWhere, this.flushedWhere == 0);
4 if (mappedFile != null) {
5 long tmpTimeStamp = mappedFile.getStoreTimestamp();
6 int offset = mappedFile.flush(flushLeastPages);
7 long where = mappedFile.getFileFromOffset() + offset;
8 result = where == this.flushedWhere;
9 this.flushedWhere = where;
10 if (0 == flushLeastPages) {
11 this.storeTimestamp = tmpTimeStamp;
12 }
13 }
14
15 return result;
16 }
这里首先根据flushedWhere上一次刷盘完成后的offset,通过findMappedFileByOffset方法,找到CommitLog文件的映射MappedFile
有关MappedFile及其相关操作在我之前的博客中介绍过很多次,就不再累赘
再找到MappedFile后,调用其flush方法:
MappedFile的flush方法:
1 public int flush(final int flushLeastPages) {
2 if (this.isAbleToFlush(flushLeastPages)) {
3 if (this.hold()) {
4 int value = getReadPosition();
5
6 try {
7 //We only append data to fileChannel or mappedByteBuffer, never both.
8 if (writeBuffer != null || this.fileChannel.position() != 0) {
9 this.fileChannel.force(false);
10 } else {
11 this.mappedByteBuffer.force();
12 }
13 } catch (Throwable e) {
14 log.error("Error occurred when force data to disk.", e);
15 }
16
17 this.flushedPosition.set(value);
18 this.release();
19 } else {
20 log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
21 this.flushedPosition.set(getReadPosition());
22 }
23 }
24 return this.getFlushedPosition();
25 }
首先isAbleToFlush方法:
1 private boolean isAbleToFlush(final int flushLeastPages) {
2 int flush = this.flushedPosition.get();
3 int write = getReadPosition();
4
5 if (this.isFull()) {
6 return true;
7 }
8
9 if (flushLeastPages > 0) {
10 return ((write / OS_PAGE_SIZE) - (flush / OS_PAGE_SIZE)) >= flushLeastPages;
11 }
12
13 return write > flush;
14 }
其中flush记录的是上一次完成刷新后的位置,write记录的是当前消息内容写入后的位置
当flushLeastPages 大于0的时候,通过:
1 return ((write / OS_PAGE_SIZE) - (flush / OS_PAGE_SIZE)) >= flushLeastPages;
可以计算出是否满足page的要求,其中OS_PAGE_SIZE是4K,也就是说1个page大小是4k
由于这里是同步刷盘,flushLeastPages是0,不对page要求,只要有缓存有内容就会刷盘;但是在异步刷盘中,flushLeastPages是4,也就是说,只有当缓存的消息至少是4(page个数)*4K(page大小)= 16K时,异步刷盘才会将缓存写入文件
回到MappedFile的flush方法,在通过isAbleToFlush检查完写入要求后
1 int value = getReadPosition();
2 try {
3 //We only append data to fileChannel or mappedByteBuffer, never both.
4 if (writeBuffer != null || this.fileChannel.position() != 0) {
5 this.fileChannel.force(false);
6 } else {
7 this.mappedByteBuffer.force();
8 }
9 } catch (Throwable e) {
10 log.error("Error occurred when force data to disk.", e);
11 }
12
13 this.flushedPosition.set(value);
首先通过getReadPosition获取当前消息内容写入后的位置,由于是同步刷盘,所以这里调用mappedByteBuffer的force方法,通过JDK的NIO操作,将mappedByteBuffer缓存中的数据写入CommitLog文件中
最后更新flushedPosition的值
再回到MappedFileQueue的flush方法,在完成MappedFile的flush后,还需要更新flushedWhere的值
此时缓存中的数据完成了持久化,同步刷盘结束
异步刷盘:
①FlushCommitLogService:
1 public void run() {
2 CommitLog.log.info(this.getServiceName() + " service started");
3
4 while (!this.isStopped()) {
5 boolean flushCommitLogTimed = CommitLog.this.defaultMessageStore.getMessageStoreConfig().isFlushCommitLogTimed();
6
7 int interval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushIntervalCommitLog();
8 int flushPhysicQueueLeastPages = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogLeastPages();
9
10 int flushPhysicQueueThoroughInterval =
11 CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogThoroughInterval();
12
13 boolean printFlushProgress = false;
14
15 // Print flush progress
16 long currentTimeMillis = System.currentTimeMillis();
17 if (currentTimeMillis >= (this.lastFlushTimestamp + flushPhysicQueueThoroughInterval)) {
18 this.lastFlushTimestamp = currentTimeMillis;
19 flushPhysicQueueLeastPages = 0;
20 printFlushProgress = (printTimes++ % 10) == 0;
21 }
22
23 try {
24 if (flushCommitLogTimed) {
25 Thread.sleep(interval);
26 } else {
27 this.waitForRunning(interval);
28 }
29
30 if (printFlushProgress) {
31 this.printFlushProgress();
32 }
33
34 long begin = System.currentTimeMillis();
35 CommitLog.this.mappedFileQueue.flush(flushPhysicQueueLeastPages);
36 long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
37 if (storeTimestamp > 0) {
38 CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
39 }
40 long past = System.currentTimeMillis() - begin;
41 if (past > 500) {
42 log.info("Flush data to disk costs {} ms", past);
43 }
44 } catch (Throwable e) {
45 CommitLog.log.warn(this.getServiceName() + " service has exception. ", e);
46 this.printFlushProgress();
47 }
48 }
49
50 // Normal shutdown, to ensure that all the flush before exit
51 boolean result = false;
52 for (int i = 0; i < RETRY_TIMES_OVER && !result; i++) {
53 result = CommitLog.this.mappedFileQueue.flush(0);
54 CommitLog.log.info(this.getServiceName() + " service shutdown, retry " + (i + 1) + " times " + (result ? "OK" : "Not OK"));
55 }
56
57 this.printFlushProgress();
58
59 CommitLog.log.info(this.getServiceName() + " service end");
60 }
flushCommitLogTimed:是否使用定时刷盘
interval:刷盘时间间隔,默认500ms
flushPhysicQueueLeastPages:page大小,默认4个
flushPhysicQueueThoroughInterval:彻底刷盘时间间隔,默认10s
首先根据lastFlushTimestamp(上一次刷盘时间)+ flushPhysicQueueThoroughInterval和当前时间比较,判断是否需要进行一次彻底刷盘,若达到了需要则将flushPhysicQueueLeastPages置为0
接着根据flushCommitLogTimed判断
当flushCommitLogTimed为true,使用sleep等待500ms
当flushCommitLogTimed为false,调用waitForRunning在超时时间为500ms下阻塞,其唤醒条件也就是在handleDiskFlush中的wakeup唤醒
最后,和同步刷盘一样,调用mappedFileQueue的flush方法
只不过,这里的flushPhysicQueueLeastPages决定了其是进行彻底刷新,还是按4page(16K)的标准刷新
②CommitRealTimeService
这种刷盘方式需要和FlushCommitLogService配合
CommitRealTimeService的run方法:
1 public void run() {
2 CommitLog.log.info(this.getServiceName() + " service started");
3 while (!this.isStopped()) {
4 int interval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitIntervalCommitLog();
5
6 int commitDataLeastPages = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitCommitLogLeastPages();
7
8 int commitDataThoroughInterval =
9 CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitCommitLogThoroughInterval();
10
11 long begin = System.currentTimeMillis();
12 if (begin >= (this.lastCommitTimestamp + commitDataThoroughInterval)) {
13 this.lastCommitTimestamp = begin;
14 commitDataLeastPages = 0;
15 }
16
17 try {
18 boolean result = CommitLog.this.mappedFileQueue.commit(commitDataLeastPages);
19 long end = System.currentTimeMillis();
20 if (!result) {
21 this.lastCommitTimestamp = end; // result = false means some data committed.
22 //now wake up flush thread.
23 flushCommitLogService.wakeup();
24 }
25
26 if (end - begin > 500) {
27 log.info("Commit data to file costs {} ms", end - begin);
28 }
29 this.waitForRunning(interval);
30 } catch (Throwable e) {
31 CommitLog.log.error(this.getServiceName() + " service has exception. ", e);
32 }
33 }
34
35 boolean result = false;
36 for (int i = 0; i < RETRY_TIMES_OVER && !result; i++) {
37 result = CommitLog.this.mappedFileQueue.commit(0);
38 CommitLog.log.info(this.getServiceName() + " service shutdown, retry " + (i + 1) + " times " + (result ? "OK" : "Not OK"));
39 }
40 CommitLog.log.info(this.getServiceName() + " service end");
41 }
这里的逻辑和FlushCommitLogService中相似,之不过参数略有不同
interval:提交时间间隔,默认200ms
commitDataLeastPages:page大小,默认4个
commitDataThoroughInterval:提交完成时间间隔,默认200ms
基本和FlushCommitLogService相似,只不过调用了mappedFileQueue的commit方法
1 public boolean commit(final int commitLeastPages) {
2 boolean result = true;
3 MappedFile mappedFile = this.findMappedFileByOffset(this.committedWhere, this.committedWhere == 0);
4 if (mappedFile != null) {
5 int offset = mappedFile.commit(commitLeastPages);
6 long where = mappedFile.getFileFromOffset() + offset;
7 result = where == this.committedWhere;
8 this.committedWhere = where;
9 }
10
11 return result;
12 }
这里和mappedFileQueue的flush方法很相似,通过committedWhere寻找MappedFile
然后调用MappedFile的commit方法:
1 public int commit(final int commitLeastPages) {
2 if (writeBuffer == null) {
3 //no need to commit data to file channel, so just regard wrotePosition as committedPosition.
4 return this.wrotePosition.get();
5 }
6 if (this.isAbleToCommit(commitLeastPages)) {
7 if (this.hold()) {
8 commit0(commitLeastPages);
9 this.release();
10 } else {
11 log.warn("in commit, hold failed, commit offset = " + this.committedPosition.get());
12 }
13 }
14
15 // All dirty data has been committed to FileChannel.
16 if (writeBuffer != null && this.transientStorePool != null && this.fileSize == this.committedPosition.get()) {
17 this.transientStorePool.returnBuffer(writeBuffer);
18 this.writeBuffer = null;
19 }
20
21 return this.committedPosition.get();
22 }
依旧和MappedFile的flush方法很相似,在isAbleToCommit检查完page后调用commit0方法
MappedFile的commit0方法:
1 protected void commit0(final int commitLeastPages) {
2 int writePos = this.wrotePosition.get();
3 int lastCommittedPosition = this.committedPosition.get();
4
5 if (writePos - this.committedPosition.get() > 0) {
6 try {
7 ByteBuffer byteBuffer = writeBuffer.slice();
8 byteBuffer.position(lastCommittedPosition);
9 byteBuffer.limit(writePos);
10 this.fileChannel.position(lastCommittedPosition);
11 this.fileChannel.write(byteBuffer);
12 this.committedPosition.set(writePos);
13 } catch (Throwable e) {
14 log.error("Error occurred when commit data to FileChannel.", e);
15 }
16 }
17 }
中说过,当使用这种方式时,会先将消息缓存在writeBuffer中而不是之前的mappedByteBuffer
这里就可以清楚地看到将writeBuffer中从lastCommittedPosition(上次提交位置)开始到writePos(缓存消息结束位置)的内容缓存到了fileChannel中相同的位置,并没有写入磁盘
在缓存到fileChannel后,会更新committedPosition值
回到commit方法,在向fileCfihannel缓存完毕后,会检查committedPosition是否达到了fileSize,也就是判断writeBuffer中的内容是不是去全部提交完毕
若是全部提交,需要通过transientStorePool的returnBuffer方法来回收利用writeBuffer
transientStorePool其实是一个双向队列,由CommitLog来管理
TransientStorePool:
1 public class TransientStorePool {
2 private static final InternalLogger log = InternalLoggerFactory.getLogger(LoggerName.STORE_LOGGER_NAME);
3
4 private final int poolSize;
5 private final int fileSize;
6 private final Deque<ByteBuffer> availableBuffers;
7 private final MessageStoreConfig storeConfig;
8
9 public TransientStorePool(final MessageStoreConfig storeConfig) {
10 this.storeConfig = storeConfig;
11 this.poolSize = storeConfig.getTransientStorePoolSize();
12 this.fileSize = storeConfig.getMapedFileSizeCommitLog();
13 this.availableBuffers = new ConcurrentLinkedDeque<>();
14 }
15 ......
16 }
returnBuffer方法:
1 public void returnBuffer(ByteBuffer byteBuffer) {
2 byteBuffer.position(0);
3 byteBuffer.limit(fileSize);
4 this.availableBuffers.offerFirst(byteBuffer);
5 }
这里就可以清楚地看到byteBuffer确实被回收了
回到MappedFileQueue的commit方法:
1 public boolean commit(final int commitLeastPages) {
2 boolean result = true;
3 MappedFile mappedFile = this.findMappedFileByOffset(this.committedWhere, this.committedWhere == 0);
4 if (mappedFile != null) {
5 int offset = mappedFile.commit(commitLeastPages);
6 long where = mappedFile.getFileFromOffset() + offset;
7 result = where == this.committedWhere;
8 this.committedWhere = where;
9 }
10
11 return result;
12 }
在完成mappedFile的commit后,通过where和committedWhere来判断是否真的向fileCfihannel缓存了 ,只有确实缓存了result才是false!
之后会更新committedWhere,并返回result
那么回到CommitRealTimeService的run方法,在完成commit之后,会判断result
只有真的向fileCfihannel缓存后,才会调用flushCommitLogService的wakeup方法,也就是唤醒了FlushCommitLogService的刷盘线程
唯一和之前分析的FlushCommitLogService不同的地方是在MappedFile的flush方法中:
1 if (writeBuffer != null || this.fileChannel.position() != 0) {
2 this.fileChannel.force(false);
3 } else {
4 this.mappedByteBuffer.force();
5 }
之前在没有开启内存字节缓冲区的情况下,是将mappedByteBuffer中的内容写入磁盘
而这时,终于轮到fileChannel了
可以看到这里的条件判断,当writeBuffer不等与null,或者fileChannel的position不等与0
writeBuffer等于null的情况会在TransientStorePool对其回收之后
到这里就可以明白开启内存字节缓冲区的情况下,其实是进行了两次缓存才写入磁盘
至此,Broker的消息持久化以及刷盘的整个过程完毕
原文出处:https://www.cnblogs.com/a526583280/p/11312750.html
来源:oschina
链接:https://my.oschina.net/u/4277109/blog/3257099