Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RIP-44] Support DLedger Controller #4484

Merged
merged 66 commits into from
Jul 21, 2022
Merged

Conversation

RongtongJin
Copy link
Contributor

What is the purpose of the change

After the release of RocketMQ 4.5.0, the DLedger mode (raft) was introduced. The raft commitlog under this architecture is used to replace the original commitlog so that it has the ability to failover. However, there are some disadvantages going with this architecture due to the raft capability on replication, including:

  1. To have failover ability, the number of replicas in the broker group must be 3 or more

  2. Acks from replicas need to strictly follow the majority rule of the Raft protocol, that is, 3-replica architecture requires acks from 2 replicas to return, and 5-replica architecture requires acks from 3 to return

  3. Since the store repository relies on OpenMessaging DLedger in DLedger mode, Native storage and replication capabilities of RocketMQ (such as transientStorePool and zero-copy capabilities) cannot be reused, and maintenance becomes difficult as well.

To handle those mentioned problems, I would like to start an RIP-44 Support DLedger Controller. With this improvement, DLedger (Raft) capability will be abstracted onto the upper layer, becoming an optional and loosely coupled coordination component named DLedger Controller.

After the deployment of DLedger Controller, the master-slave architecture will also equip with failover capability. The DLedger Controller can optionally be embedded into the NameServer (the NameServer itself remains stateless and cannot provide electoral capabilities when the majority is down), or it can be deployed independently.

DLedger controller is an optional component that does not change the previous operation and maintenance mode. Compared with other components, its downtime will not affect online services. In addition, RIP-44 unifies the storage and replication of RocketMQ, resulting in lower maintenance costs and faster development iterations. In terms of compatibility, the master-slave architecture can upgrade without compatibility problems.

I've already done part of the work with @hzh0425 . Our proposals are provided at the links below:

https://docs.google.com/document/d/1tSJkor_3Js4NBaVA0UENGyM8Mh0SrRMXszRyI91hjJ8/edit?usp=sharing

Chinese version:

https://shimo.im/docs/N2A1Mz9QZltQZoAD/

Brief changelog

Refer https://shimo.im/docs/N2A1Mz9QZltQZoAD#anchor-qJhl

Verifying this change

Refer UTs, ITs and testing report

Follow this checklist to help us incorporate your contribution quickly and easily. Notice, it would be helpful if you could finish the following 5 checklist(the last one is not necessary)before request the community to review your PR.

  • Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test(over 80% coverage) to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • Run mvn -B clean apache-rat:check findbugs:findbugs checkstyle:checkstyle to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

rongtong.jrt and others added 30 commits April 29, 2022 16:51
* feature: initial, add controllerApi, event, request and response

* feature: Apply event in ReplicasInfoManager

* feature: Done the work in ReplicasInfoManager
next step: try test

* feature: Add some test for ReplicasInfoManager

* feature: Build the architecture of dledgerController

* feature: Done the work in controller

* style: review code

* feature: add controllerProcessor in name-srv

* style: use defensive copy in constructor;

* style: review code

* style: review code

* feature: let controller api return RemotingCommand

* feature:
1.remove originMasterId in replicasInfo
2.add DledgerControllerConfig

* feature:
1.add option isProcessReadEvent.
2.add ControllerConfig

* feature:
add namesrv into dledgerController to predict whether the broker is alive.

* style: code review

* feature: process initial log when controller become leader

* style: review code

* style: review code

* style: review code

* style: change version

* fixbug
* feature: support auto switch role ha service

* feature:
1.add EpochStartOffset in ha protocal
2.notify AutoSwitchHAService when delete expired files
3.add more tests for AutoSwitchHAService

* feature:
1.transfer syncFromLastFile from slave to master in handshake state
2.return false if find consistent point failed
…ger-controller

# Conflicts:
#	store/src/main/java/org/apache/rocketmq/store/ha/DefaultHAService.java
* feature:
1.add replicasManager and controllerProxy

* feature:
1.add brokerHaAddress in controller.

* feature:
1.add replicasManager

* feature:
1.add brokerController to replicasManager.
2.change brokerController when change role.

* feature: add api message empty constructor

* feature: move set from header to remotingRequest body

* feature: modify autoSwitchHaClient's rpc protocol, add slaveId, slaveAddress.

* feature: review code

* feature: review code

* add some debug info

* feature: let ha service get masterHaAdress after register to name-srv

* feature: review code xxxx

* feature: let controller return err remark

* style: review

* style: review code

* feature: add more integrationTest

* style: review code

* style: review code

* fix: port already bind
* feature: add new module controller

* feature: add heartbeat manager

* feature: link requestProcessor and heartbeatmanager

* feature: add controllerManager and startup

* feature: remove namesrv's duplicate controller code

* review code
* let broker send heartbeat to controller

* code remview

* code review

* fix bug

* add state in replicasmanager

* add brokerId when getReplicasInfo

* code review
* feature: add lastCatchupTime ms and expandInSyncStateSet

* code review

* add shrink and expand inSyncStateSet in AutoSwitchHAService

* add option allAckInSyncStateSet

* let replicasManager use AutoSwitchHAService's expand and shrink inSyncStateSet api.

* fix bug

* use CopyOnWriteArraySet to replace lock

* code review

* code review

* code review

* code review
* merge branch support_async_learner

* use isSlave() to replace BrokerRole == Slave

* mark asyncLearner

* code review

* Revert "use isSlave() to replace BrokerRole == Slave"

This reverts commit 6599f97.

* review

* remove asyncLeaner role

* code review

* code review
* modify pom for using dledger

* rename option

* 1.send heartbeat to haclient when get epochEntry failed to hold connection.
2.update lastCatchupTimeMs in time.

* fix some bugs
* add tool getSyncStateDataCommand

* get controller leaderAddr when execute command

* modify getControllerMetadata api

* code review

* code review

* add tool get brokerEpochCache

* init command

* set lastEpochEndOffset

* take maxPhyOffset in EpochCache
…tion (#4399)

* fix bug

* throw exception in haWriter

* fix haconnection can't read msg.

* set channel buf = 0

* code review

* code review

* change client address to slave address

* Revert "change client address to slave address"

This reverts commit 198bcdb.
* add tool get controller metadata

* fix bug
* Polish switching logic and auto switch ha code

* Make UT can pass

* Polish the code
)

* reuse dledger remotingServer in Controller

* reuse dledger remotingServer in Namesrv when startup controller

* fix some bugs in controller

* trigger ci

* code review

* fix controllerManagerTest bug
…e's map (#4414)

* record lastCaughtupTimeMs in map

* code reivew
…ode (#4413)

* add design document

* add quickstart document

* review

* add dledgerController design

* add license
* Fix bug that do not remove caughtUpTime in connectionCaughtUpTimeTable

* Polish the comment

* Remove replicas from syncStateSet if connection disconnect and ha service not shutdown
* add broker api --notifyBrokerRoleChanged --

* add broker api --notifyBrokerRoleChanged --

* let controller inform broker when role changed

* code reivew
# Conflicts:
#	broker/src/main/java/org/apache/rocketmq/broker/processor/AdminBrokerProcessor.java
#	common/src/main/java/org/apache/rocketmq/common/protocol/RequestCode.java
#	store/src/main/java/org/apache/rocketmq/store/config/MessageStoreConfig.java
#	tools/src/main/java/org/apache/rocketmq/tools/command/MQAdminStartup.java
…ketmq into 5.0.0-beta-dledger-controller

# Conflicts:
#	pom.xml
@RongtongJin RongtongJin changed the base branch from 5.0.0-beta to develop July 13, 2022 07:09
* Add CRC32 verification when saving checkpoint file

* Add license to header
@RongtongJin
Copy link
Contributor Author

6.29 review意见:

  1. 以SyncStateSet不以IP作为依据,而是以brokerId(Controller给broker的唯一编号)作为依据,HA部分同步要改。
  2. 整理InSyncReplicas、minInSyncReplicas在Controller模式下含义,当SyncStateSet个数小于minInSyncReplicas时,副本写入将不再成功,返回PutMessageStatus.IN_SYNC_REPLICAS_NOT_ENOUGH
  3. 不再复用NameServer的心跳,都使用Controller独立的心跳、去掉controllerDeployedStandAlone参数,心跳包新增maxOffset、confirmOffset、last epoch等字段。这些会作为选举的依据
  4. 选举后SyncStateSet应该继承下来而不是直接变为新Master,一个场景是新被选出的主,该主马上又挂了,此时就没有master可以被选举(不过从当前实现来看,若改为继承,slave若没有马上连到master,会增加故障恢复的时间,并且master会马上更新新的SyncStateSet,而slave连上master首先需要获取haMasterAddress从namesrv注册结果中返回,需要一段时间(考虑在controller中维护haMasterAddress)),同时这种情况下注意confirmOffset回退问题(先进行分析)
  5. Controller 的格式带上rocketmq message header,以便于后面扩展
  6. 对于磁盘坏、心跳好的情况,需要再细分PutMessageStatus(能显示磁盘坏),如果磁盘坏了,同样需要选举切换。

已完成第2、3点修复。第1点修复@hzh0425进行中

…-controller

# Conflicts:
#	distribution/bin/mqshutdown
#	pom.xml
Copy link
Member

@ShannonDing ShannonDing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lizhiboo lizhiboo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@duhenglucky duhenglucky merged commit efbc4a1 into develop Jul 21, 2022
@hzh0425
Copy link
Member

hzh0425 commented Jul 21, 2022

Thanks~

@RongtongJin RongtongJin deleted the 5.0.0-beta-dledger-controller branch July 22, 2022 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants