forked from apache/kudu
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[java] update master table locations cache #10
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
zhangyifan27
commented
Jul 4, 2022
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
Outdated
Show resolved
Hide resolved
zhangyifan27
commented
Jul 4, 2022
java/kudu-client/src/main/java/org/apache/kudu/client/TableLocationsCache.java
Show resolved
Hide resolved
A better way is to disconnect the old channel. |
cb5457b
to
8bd6406
Compare
Recently a master in our cluster was down because of network issues and somehow the server didn't close the connections to some clients. Then these clients keep trying to connect to the dead master but can't receive response until rpc timeout, even when this server is up the client still send rpc through the old channel and can't connect to the server and the new leader master. The only solution is to restart clients. This patch fixes the issue that java client can't invalidate stale locations of the leader master. Maybe in this case we also need a better way to trigger connection shutdown for an inactive channel. Change-Id: Ia2877518866ac4c2d1dda6427ce57d08df48a864
8bd6406
to
469b6b4
Compare
zhangyifan27
pushed a commit
that referenced
this pull request
Jun 12, 2024
It turned out that auto leader rebalancing task wasn't explicitly shutdown upon shutting down catalog manager. That lead to race conditions as reported by TSAN, at least in test scenarios (see below). This patch addresses the issue. WARNING: ThreadSanitizer: data race (pid=23827) Write of size 1 at 0x7b4000008208 by main thread: #0 AnnotateRWLockDestroy thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp:264 (auto_rebalancer-test+0x33575e) #1 kudu::rw_spinlock::~rw_spinlock() src/kudu/util/locks.h:89:5 (libmaster.so+0x359376) #2 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:108:1 (libmaster.so+0x4ad201) #3 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:107:25 (libmaster.so+0x4ad229) #4 std::__1::default_delete<kudu::master::TSManager>::operator()(kudu::master::TSManager*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x407ce7) #5 std::__1::unique_ptr<kudu::master::TSManager, std::__1::default_delete<kudu::master::TSManager> >::reset(kudu::master::TSManager*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x40157d) #6 std::__1::unique_ptr<kudu::master::TSManager, std::__1::default_delete<kudu::master::TSManager> >::~unique_ptr() thirdparty/installed/tsan/include/c++/v1/memory:2471:19 (libmaster.so+0x4015eb) #7 kudu::master::Master::~Master() src/kudu/master/master.cc:263:1 (libmaster.so+0x3f7a4a) #8 kudu::master::Master::~Master() src/kudu/master/master.cc:261:19 (libmaster.so+0x3f7dc9) #9 std::__1::default_delete<kudu::master::Master>::operator()(kudu::master::Master*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x435627) #10 std::__1::unique_ptr<kudu::master::Master, std::__1::default_delete<kudu::master::Master> >::reset(kudu::master::Master*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x42e6ed) #11 kudu::master::MiniMaster::Shutdown() src/kudu/master/mini_master.cc:120:13 (libmaster.so+0x4c2612) ... Previous atomic write of size 4 at 0x7b4000008208 by thread T439 (mutexes: write M1141235379631443968): #0 __tsan_atomic32_compare_exchange_strong thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp:780 (auto_rebalancer-test+0x33eb60) #1 base::subtle::Release_CompareAndSwap(int volatile*, int, int) /src/kudu/gutil/atomicops-internals-tsan.h:88:3 (libmaster.so+0x2e2b34) #2 kudu::rw_semaphore::unlock_shared() src/kudu/util/rw_semaphore.h:91:19 (libmaster.so+0x2e29c8) #3 kudu::rw_spinlock::unlock_shared() src/kudu/util/locks.h:99:10 (libmaster.so+0x2e28ef) #4 std::__1::shared_lock<kudu::rw_spinlock>::~shared_lock() /thirdparty/installed/tsan/include/c++/v1/shared_mutex:369:19 (libmaster.so+0x2e23e0) #5 kudu::master::TSManager::GetAllDescriptors(std::__1::vector<std::__1::shared_ptr<kudu::master::TSDescriptor>, std::__1::allocator<std::__1::shared_ptr<kudu::master::TSDescriptor> > >*) const src/kudu/master/ts_manager.cc:206:1 (libmaster.so+0x4adeb6) #6 kudu::master::AutoLeaderRebalancerTask::RunLeaderRebalancer() src/kudu/master/auto_leader_rebalancer.cc:405:16 (libmaster.so+0x2fb51b) #7 kudu::master::AutoLeaderRebalancerTask::RunLoop() src/kudu/master/auto_leader_rebalancer.cc:445:7 (libmaster.so+0x2fbaa9) This is a follow-up to 10efaf2. Change-Id: Iccd66d00280d22b37386230874937e5260f07f3b Reviewed-on: http://gerrit.cloudera.org:8080/21417 Reviewed-by: Wang Xixu <[email protected]> Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Yifan Zhang <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recently a master in our cluster is down because of network issues and somehow the server didn't close the connections to some clients. Then these clients keep trying to connect to the dead master but can't receive response until timeout, even when this server is up the client still send rpc through the old channel and can't connect to the server and the new leader master. The only solution is to restart clients.
This patch fixes the issue that java client can't invalidate stale locations of the leader master. Maybe in this case we also need a better way to trigger connection shutdown for an inactive channel.