-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RIP-44] Support DLedger Controller #4484
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* feature: initial, add controllerApi, event, request and response * feature: Apply event in ReplicasInfoManager * feature: Done the work in ReplicasInfoManager next step: try test * feature: Add some test for ReplicasInfoManager * feature: Build the architecture of dledgerController * feature: Done the work in controller * style: review code * feature: add controllerProcessor in name-srv * style: use defensive copy in constructor; * style: review code * style: review code * feature: let controller api return RemotingCommand * feature: 1.remove originMasterId in replicasInfo 2.add DledgerControllerConfig * feature: 1.add option isProcessReadEvent. 2.add ControllerConfig * feature: add namesrv into dledgerController to predict whether the broker is alive. * style: code review * feature: process initial log when controller become leader * style: review code * style: review code * style: review code * style: change version * fixbug
* feature: support auto switch role ha service * feature: 1.add EpochStartOffset in ha protocal 2.notify AutoSwitchHAService when delete expired files 3.add more tests for AutoSwitchHAService * feature: 1.transfer syncFromLastFile from slave to master in handshake state 2.return false if find consistent point failed
…ger-controller # Conflicts: # store/src/main/java/org/apache/rocketmq/store/ha/DefaultHAService.java
* feature: 1.add replicasManager and controllerProxy * feature: 1.add brokerHaAddress in controller. * feature: 1.add replicasManager * feature: 1.add brokerController to replicasManager. 2.change brokerController when change role. * feature: add api message empty constructor * feature: move set from header to remotingRequest body * feature: modify autoSwitchHaClient's rpc protocol, add slaveId, slaveAddress. * feature: review code * feature: review code * add some debug info * feature: let ha service get masterHaAdress after register to name-srv * feature: review code xxxx * feature: let controller return err remark * style: review * style: review code * feature: add more integrationTest * style: review code * style: review code * fix: port already bind
* feature: add new module controller * feature: add heartbeat manager * feature: link requestProcessor and heartbeatmanager * feature: add controllerManager and startup * feature: remove namesrv's duplicate controller code * review code
* let broker send heartbeat to controller * code remview * code review * fix bug * add state in replicasmanager * add brokerId when getReplicasInfo * code review
* feature: add lastCatchupTime ms and expandInSyncStateSet * code review * add shrink and expand inSyncStateSet in AutoSwitchHAService * add option allAckInSyncStateSet * let replicasManager use AutoSwitchHAService's expand and shrink inSyncStateSet api. * fix bug * use CopyOnWriteArraySet to replace lock * code review * code review * code review * code review
* merge branch support_async_learner * use isSlave() to replace BrokerRole == Slave * mark asyncLearner * code review * Revert "use isSlave() to replace BrokerRole == Slave" This reverts commit 6599f97. * review * remove asyncLeaner role * code review * code review
* modify pom for using dledger * rename option * 1.send heartbeat to haclient when get epochEntry failed to hold connection. 2.update lastCatchupTimeMs in time. * fix some bugs
* add tool getSyncStateDataCommand * get controller leaderAddr when execute command * modify getControllerMetadata api * code review * code review * add tool get brokerEpochCache * init command * set lastEpochEndOffset * take maxPhyOffset in EpochCache
* add tool get controller metadata * fix bug
* Polish switching logic and auto switch ha code * Make UT can pass * Polish the code
…e's map (#4414) * record lastCaughtupTimeMs in map * code reivew
…ode (#4413) * add design document * add quickstart document * review * add dledgerController design * add license
* Fix bug that do not remove caughtUpTime in connectionCaughtUpTimeTable * Polish the comment * Remove replicas from syncStateSet if connection disconnect and ha service not shutdown
* add broker api --notifyBrokerRoleChanged -- * add broker api --notifyBrokerRoleChanged -- * let controller inform broker when role changed * code reivew
…is no longer reused
# Conflicts: # broker/src/main/java/org/apache/rocketmq/broker/processor/AdminBrokerProcessor.java # common/src/main/java/org/apache/rocketmq/common/protocol/RequestCode.java # store/src/main/java/org/apache/rocketmq/store/config/MessageStoreConfig.java # tools/src/main/java/org/apache/rocketmq/tools/command/MQAdminStartup.java
…ketmq into 5.0.0-beta-dledger-controller # Conflicts: # pom.xml
# Conflicts: # .travis.yml
2. Make changeToMaster and changeToSlave have default implementation
已完成第2、3点修复。第1点修复@hzh0425进行中 |
ShannonDing
reviewed
Jul 20, 2022
…-controller # Conflicts: # distribution/bin/mqshutdown # pom.xml
ShannonDing
reviewed
Jul 20, 2022
broker/src/main/java/org/apache/rocketmq/broker/controller/ReplicasManager.java
Show resolved
Hide resolved
odbozhou
approved these changes
Jul 20, 2022
ShannonDing
approved these changes
Jul 20, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lizhiboo
approved these changes
Jul 20, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
duhenglucky
approved these changes
Jul 20, 2022
Thanks~ |
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
After the release of RocketMQ 4.5.0, the DLedger mode (raft) was introduced. The raft commitlog under this architecture is used to replace the original commitlog so that it has the ability to failover. However, there are some disadvantages going with this architecture due to the raft capability on replication, including:
To have failover ability, the number of replicas in the broker group must be 3 or more
Acks from replicas need to strictly follow the majority rule of the Raft protocol, that is, 3-replica architecture requires acks from 2 replicas to return, and 5-replica architecture requires acks from 3 to return
Since the store repository relies on OpenMessaging DLedger in DLedger mode, Native storage and replication capabilities of RocketMQ (such as transientStorePool and zero-copy capabilities) cannot be reused, and maintenance becomes difficult as well.
To handle those mentioned problems, I would like to start an RIP-44 Support DLedger Controller. With this improvement, DLedger (Raft) capability will be abstracted onto the upper layer, becoming an optional and loosely coupled coordination component named DLedger Controller.
After the deployment of DLedger Controller, the master-slave architecture will also equip with failover capability. The DLedger Controller can optionally be embedded into the NameServer (the NameServer itself remains stateless and cannot provide electoral capabilities when the majority is down), or it can be deployed independently.
DLedger controller is an optional component that does not change the previous operation and maintenance mode. Compared with other components, its downtime will not affect online services. In addition, RIP-44 unifies the storage and replication of RocketMQ, resulting in lower maintenance costs and faster development iterations. In terms of compatibility, the master-slave architecture can upgrade without compatibility problems.
I've already done part of the work with @hzh0425 . Our proposals are provided at the links below:
https://docs.google.com/document/d/1tSJkor_3Js4NBaVA0UENGyM8Mh0SrRMXszRyI91hjJ8/edit?usp=sharing
Chinese version:
https://shimo.im/docs/N2A1Mz9QZltQZoAD/
Brief changelog
Refer https://shimo.im/docs/N2A1Mz9QZltQZoAD#anchor-qJhl
Verifying this change
Refer UTs, ITs and testing report
Follow this checklist to help us incorporate your contribution quickly and easily. Notice,
it would be helpful if you could finish the following 5 checklist(the last one is not necessary)before request the community to review your PR
.[ISSUE #123] Fix UnknownException when host config not exist
. Each commit in the pull request should have a meaningful subject line and body.mvn -B clean apache-rat:check findbugs:findbugs checkstyle:checkstyle
to make sure basic checks pass. Runmvn clean install -DskipITs
to make sure unit-test pass. Runmvn clean test-compile failsafe:integration-test
to make sure integration-test pass.