forked from apache/kudu
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kudu-2915 #6
Closed
Closed
Kudu-2915 #6
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aa53871
to
e3eb2e2
Compare
66899b6
to
022de63
Compare
3cf21d9
to
525927c
Compare
51f9c1d
to
565c2ea
Compare
Add a 'kudu tserver unregister' tool to unregister a tserver from the master. This tool will be useful when we want to decommission a tserver without restarting masters. It removes the dead tserver from master's in-memory map and persisted catalog by default. It's also possible to unregister a tserver which is not presumed dead by adding '-force_unregister_live_tserver', and keep tserver's persisted state by adding '-remove_tserver_state=false'. Change-Id: If1f5c2979a8d14428f4bcc8e850c57ce228c793a
565c2ea
to
ca77518
Compare
zhangyifan27
pushed a commit
that referenced
this pull request
Jun 12, 2024
It turned out that auto leader rebalancing task wasn't explicitly shutdown upon shutting down catalog manager. That lead to race conditions as reported by TSAN, at least in test scenarios (see below). This patch addresses the issue. WARNING: ThreadSanitizer: data race (pid=23827) Write of size 1 at 0x7b4000008208 by main thread: #0 AnnotateRWLockDestroy thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp:264 (auto_rebalancer-test+0x33575e) #1 kudu::rw_spinlock::~rw_spinlock() src/kudu/util/locks.h:89:5 (libmaster.so+0x359376) #2 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:108:1 (libmaster.so+0x4ad201) #3 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:107:25 (libmaster.so+0x4ad229) #4 std::__1::default_delete<kudu::master::TSManager>::operator()(kudu::master::TSManager*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x407ce7) #5 std::__1::unique_ptr<kudu::master::TSManager, std::__1::default_delete<kudu::master::TSManager> >::reset(kudu::master::TSManager*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x40157d) #6 std::__1::unique_ptr<kudu::master::TSManager, std::__1::default_delete<kudu::master::TSManager> >::~unique_ptr() thirdparty/installed/tsan/include/c++/v1/memory:2471:19 (libmaster.so+0x4015eb) #7 kudu::master::Master::~Master() src/kudu/master/master.cc:263:1 (libmaster.so+0x3f7a4a) #8 kudu::master::Master::~Master() src/kudu/master/master.cc:261:19 (libmaster.so+0x3f7dc9) #9 std::__1::default_delete<kudu::master::Master>::operator()(kudu::master::Master*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x435627) #10 std::__1::unique_ptr<kudu::master::Master, std::__1::default_delete<kudu::master::Master> >::reset(kudu::master::Master*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x42e6ed) #11 kudu::master::MiniMaster::Shutdown() src/kudu/master/mini_master.cc:120:13 (libmaster.so+0x4c2612) ... Previous atomic write of size 4 at 0x7b4000008208 by thread T439 (mutexes: write M1141235379631443968): #0 __tsan_atomic32_compare_exchange_strong thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp:780 (auto_rebalancer-test+0x33eb60) #1 base::subtle::Release_CompareAndSwap(int volatile*, int, int) /src/kudu/gutil/atomicops-internals-tsan.h:88:3 (libmaster.so+0x2e2b34) #2 kudu::rw_semaphore::unlock_shared() src/kudu/util/rw_semaphore.h:91:19 (libmaster.so+0x2e29c8) #3 kudu::rw_spinlock::unlock_shared() src/kudu/util/locks.h:99:10 (libmaster.so+0x2e28ef) #4 std::__1::shared_lock<kudu::rw_spinlock>::~shared_lock() /thirdparty/installed/tsan/include/c++/v1/shared_mutex:369:19 (libmaster.so+0x2e23e0) #5 kudu::master::TSManager::GetAllDescriptors(std::__1::vector<std::__1::shared_ptr<kudu::master::TSDescriptor>, std::__1::allocator<std::__1::shared_ptr<kudu::master::TSDescriptor> > >*) const src/kudu/master/ts_manager.cc:206:1 (libmaster.so+0x4adeb6) #6 kudu::master::AutoLeaderRebalancerTask::RunLeaderRebalancer() src/kudu/master/auto_leader_rebalancer.cc:405:16 (libmaster.so+0x2fb51b) #7 kudu::master::AutoLeaderRebalancerTask::RunLoop() src/kudu/master/auto_leader_rebalancer.cc:445:7 (libmaster.so+0x2fbaa9) This is a follow-up to 10efaf2. Change-Id: Iccd66d00280d22b37386230874937e5260f07f3b Reviewed-on: http://gerrit.cloudera.org:8080/21417 Reviewed-by: Wang Xixu <[email protected]> Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Yifan Zhang <[email protected]>
zhangyifan27
pushed a commit
that referenced
this pull request
Oct 11, 2024
The race condition was reported by the TSAN like the following (with some information omitted): WARNING: ThreadSanitizer: data race (pid=1924273) Write of size 8 at 0x7b30002fe7c0 by thread T6 (mutexes: write M247597861, write M247597860, write M247597300): #0 std::__1::enable_if<(...), void>::type std::__1::swap<kudu::BlockId*>(...) thirdparty/installed/tsan/include/c++/v1/type_traits:4076:9 ... #4 kudu::tablet::RowSetMetadata::CommitRedoDeltaDataBlock(...) src/kudu/tablet/rowset_metadata.cc:197:22 #5 kudu::tablet::DeltaTracker::FlushDMS(...) src/kudu/tablet/delta_tracker.cc:826:23 #6 kudu::tablet::DeltaTracker::Flush(...) src/kudu/tablet/delta_tracker.cc:877:14 #7 kudu::tablet::DiskRowSet::FlushDeltas(...) src/kudu/tablet/diskrowset.cc:552:26 ... Previous read of size 8 at 0x7b30002fe7c0 by thread T34 (mutexes: write M247598319, write M919714229363433616, write M303002710007881612): #0 std::__1::vector<...>::size() const thirdparty/installed/tsan/include/c++/v1/vector:658:61 #1 kudu::tablet::RowSetMetadata::GetAllBlocks() const src/kudu/tablet/rowset_metadata.cc:306:37 #2 kudu::tablet::TabletMetadata::UpdateUnlocked(...) src/kudu/tablet/tablet_metadata.cc:677:40 #3 kudu::tablet::TabletMetadata::UpdateAndFlush(...) src/kudu/tablet/tablet_metadata.cc:549:5 #4 kudu::tablet::Tablet::FlushMetadata(...) src/kudu/tablet/tablet.cc:1992:21 #5 kudu::tablet::Tablet::HandleEmptyCompactionOrFlush() src/kudu/tablet/tablet.cc:2308:3 #6 kudu::tablet::Tablet::DeleteAncientDeletedRowsets() src/kudu/tablet/tablet.cc:3084:3 ... Change-Id: I07103269526d0ee98b0bb19e76e11f7d47a5b217 Reviewed-on: http://gerrit.cloudera.org:8080/21799 Reviewed-by: Abhishek Chennaka <[email protected]> Tested-by: Alexey Serbin <[email protected]>
zhangyifan27
pushed a commit
that referenced
this pull request
Feb 5, 2025
The thread pool of the DNS resolver should be shut down along with the messenger in ServerBase to prevent retrying of RPCs that failed as a collateral of the shutdown process in progress. Those RPCs might be retried by invoking rpc::Proxy::RefreshDnsAndEnqueueRequest(), etc. On the related note, I also added a guard to protect ThreadPool::tokens_ in the destructor of the ThreadPool class, as elsewhere. I also snuck in an update to call DCHECK() in a loop only when DCHECK_IS_ON() macro evaluates to 'true'. This addresses flakiness reported at least in one of the RemoteKsckTest scenarios (e.g., TestFilterOnNotabletTable in [1]). One of the related TSAN reports looked like below: RemoteKsckTest.TestFilterOnNotabletTable: WARNING: ThreadSanitizer: data race Read of size 8 at 0x7b54001e5118 by main thread: #0 std::__1::__hash_table<kudu::ThreadPoolToken*, ...>::size() const #1 std::__1::unordered_set<kudu::ThreadPoolToken*, ...>::size() const #2 kudu::ThreadPool::~ThreadPool() ... #6 kudu::kserver::KuduServer::~KuduServer() #7 kudu::tserver::TabletServer::~TabletServer() ... Previous write of size 8 at 0x7b54001e5118 by thread T262 ...: #0 std::__1::__hash_table<kudu::ThreadPoolToken*, ...>::remove(...) ... #4 kudu::ThreadPool::ReleaseToken(...) #5 kudu::ThreadPoolToken::~ThreadPoolToken() ... apache#24 kudu::consensus::LeaderElection::~LeaderElection() ... apache#35 kudu::rpc::Proxy::RefreshDnsAndEnqueueRequest(...) ... apache#41 kudu::DnsResolver::RefreshAddressesAsync() ... Thread T262 'dns-resolver [w' (tid=29102, running) created by thread T182 at: #0 pthread_create #1 kudu::Thread::StartThread(...) #2 kudu::Thread::Create(...) #3 kudu::ThreadPool::CreateThread() #4 kudu::ThreadPool::DoSubmit(..., kudu::ThreadPoolToken*) #5 kudu::ThreadPool::Submit(...) #6 kudu::DnsResolver::RefreshAddressesAsync(..) #7 kudu::rpc::Proxy::RefreshDnsAndEnqueueRequest(...) #8 kudu::rpc::Proxy::AsyncRequest(...) ... #15 kudu::rpc::OutboundCall::CallCallback() apache#16 kudu::rpc::OutboundCall::SetFailed() apache#17 kudu::rpc::Connection::Shutdown() apache#18 kudu::rpc::ReactorThread::ShutdownInternal() ... apache#25 kudu::rpc::ReactorThread::RunThread() ... [1] http://dist-test.cloudera.org:8080/test_drilldown?test_name=ksck_remote-test Change-Id: I525f1078a349dbd2926938bb4fcc3e80888dfbb4 Reviewed-on: http://gerrit.cloudera.org:8080/22434 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Abhishek Chennaka <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.