Skip to content

Commit 0bf5049

Browse files
yamahatabonzini
authored andcommitted
KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock
Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock now that KVM hooks CPU hotplug during the ONLINE phase, which can sleep. Previously, KVM hooked the STARTING phase, which is not allowed to sleep and thus could not take kvm_lock (a mutex). This effectively allows the task that's initiating hardware enabling/disabling to preempted and/or migrated. Note, the Documentation/virt/kvm/locking.rst statement that kvm_count_lock is "raw" because hardware enabling/disabling needs to be atomic with respect to migration is wrong on multiple fronts. First, while regular spinlocks can be preempted, the task holding the lock cannot be migrated. Second, preventing migration is not required. on_each_cpu() disables preemption, which ensures that cpus_hardware_enabled correctly reflects hardware state. The task may be preempted/migrated between bumping kvm_usage_count and invoking on_each_cpu(), but that's perfectly ok as kvm_usage_count is still protected, e.g. other tasks that call hardware_enable_all() will be blocked until the preempted/migrated owner exits its critical section. KVM does have lockless accesses to kvm_usage_count in the suspend/resume flows, but those are safe because all tasks must be frozen prior to suspending CPUs, and a task cannot be frozen while it holds one or more locks (userspace tasks are frozen via a fake signal). Preemption doesn't need to be explicitly disabled in the hotplug path. The hotplug thread is pinned to the CPU that's being hotplugged, and KVM only cares about having a stable CPU, i.e. to ensure hardware is enabled on the correct CPU. Lockep, i.e. check_preemption_disabled(), plays nice with this state too, as is_percpu_thread() is true for the hotplug thread. Signed-off-by: Isaku Yamahata <[email protected]> Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
1 parent 2c106f2 commit 0bf5049

File tree

2 files changed

+34
-21
lines changed

2 files changed

+34
-21
lines changed

Documentation/virt/kvm/locking.rst

+10-9
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ KVM Lock Overview
99

1010
The acquisition orders for mutexes are as follows:
1111

12+
- cpus_read_lock() is taken outside kvm_lock
13+
1214
- kvm->lock is taken outside vcpu->mutex
1315

1416
- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
@@ -225,15 +227,10 @@ time it will be set using the Dirty tracking mechanism described above.
225227
:Type: mutex
226228
:Arch: any
227229
:Protects: - vm_list
228-
229-
``kvm_count_lock``
230-
^^^^^^^^^^^^^^^^^^
231-
232-
:Type: raw_spinlock_t
233-
:Arch: any
234-
:Protects: - hardware virtualization enable/disable
235-
:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
236-
migration.
230+
- kvm_usage_count
231+
- hardware virtualization enable/disable
232+
:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
233+
enable/disable.
237234

238235
``kvm->mn_invalidate_lock``
239236
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -297,3 +294,7 @@ time it will be set using the Dirty tracking mechanism described above.
297294
:Type: mutex
298295
:Arch: x86
299296
:Protects: loading a vendor module (kvm_amd or kvm_intel)
297+
:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
298+
taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
299+
many operations need to take cpu_hotplug_lock when loading a vendor module,
300+
e.g. updating static calls.

virt/kvm/kvm_main.c

+24-12
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
100100
*/
101101

102102
DEFINE_MUTEX(kvm_lock);
103-
static DEFINE_RAW_SPINLOCK(kvm_count_lock);
104103
LIST_HEAD(vm_list);
105104

106105
static cpumask_var_t cpus_hardware_enabled;
@@ -5123,17 +5122,18 @@ static int kvm_online_cpu(unsigned int cpu)
51235122
* be enabled. Otherwise running VMs would encounter unrecoverable
51245123
* errors when scheduled to this CPU.
51255124
*/
5126-
raw_spin_lock(&kvm_count_lock);
5125+
mutex_lock(&kvm_lock);
51275126
if (kvm_usage_count) {
51285127
WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
51295128

51305129
hardware_enable_nolock(NULL);
5130+
51315131
if (atomic_read(&hardware_enable_failed)) {
51325132
atomic_set(&hardware_enable_failed, 0);
51335133
ret = -EIO;
51345134
}
51355135
}
5136-
raw_spin_unlock(&kvm_count_lock);
5136+
mutex_unlock(&kvm_lock);
51375137
return ret;
51385138
}
51395139

@@ -5149,10 +5149,10 @@ static void hardware_disable_nolock(void *junk)
51495149

51505150
static int kvm_offline_cpu(unsigned int cpu)
51515151
{
5152-
raw_spin_lock(&kvm_count_lock);
5152+
mutex_lock(&kvm_lock);
51535153
if (kvm_usage_count)
51545154
hardware_disable_nolock(NULL);
5155-
raw_spin_unlock(&kvm_count_lock);
5155+
mutex_unlock(&kvm_lock);
51565156
return 0;
51575157
}
51585158

@@ -5168,9 +5168,9 @@ static void hardware_disable_all_nolock(void)
51685168
static void hardware_disable_all(void)
51695169
{
51705170
cpus_read_lock();
5171-
raw_spin_lock(&kvm_count_lock);
5171+
mutex_lock(&kvm_lock);
51725172
hardware_disable_all_nolock();
5173-
raw_spin_unlock(&kvm_count_lock);
5173+
mutex_unlock(&kvm_lock);
51745174
cpus_read_unlock();
51755175
}
51765176

@@ -5187,7 +5187,7 @@ static int hardware_enable_all(void)
51875187
* enable hardware multiple times.
51885188
*/
51895189
cpus_read_lock();
5190-
raw_spin_lock(&kvm_count_lock);
5190+
mutex_lock(&kvm_lock);
51915191

51925192
kvm_usage_count++;
51935193
if (kvm_usage_count == 1) {
@@ -5200,7 +5200,7 @@ static int hardware_enable_all(void)
52005200
}
52015201
}
52025202

5203-
raw_spin_unlock(&kvm_count_lock);
5203+
mutex_unlock(&kvm_lock);
52045204
cpus_read_unlock();
52055205

52065206
return r;
@@ -5806,17 +5806,29 @@ static void kvm_init_debug(void)
58065806

58075807
static int kvm_suspend(void)
58085808
{
5809+
/*
5810+
* Secondary CPUs and CPU hotplug are disabled across the suspend/resume
5811+
* callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
5812+
* is stable. Assert that kvm_lock is not held to ensure the system
5813+
* isn't suspended while KVM is enabling hardware. Hardware enabling
5814+
* can be preempted, but the task cannot be frozen until it has dropped
5815+
* all locks (userspace tasks are frozen via a fake signal).
5816+
*/
5817+
lockdep_assert_not_held(&kvm_lock);
5818+
lockdep_assert_irqs_disabled();
5819+
58095820
if (kvm_usage_count)
58105821
hardware_disable_nolock(NULL);
58115822
return 0;
58125823
}
58135824

58145825
static void kvm_resume(void)
58155826
{
5816-
if (kvm_usage_count) {
5817-
lockdep_assert_not_held(&kvm_count_lock);
5827+
lockdep_assert_not_held(&kvm_lock);
5828+
lockdep_assert_irqs_disabled();
5829+
5830+
if (kvm_usage_count)
58185831
hardware_enable_nolock(NULL);
5819-
}
58205832
}
58215833

58225834
static struct syscore_ops kvm_syscore_ops = {

0 commit comments

Comments
 (0)