Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock introduced in 5.x #195

Closed
donkeyteethUX opened this issue Mar 25, 2022 · 7 comments · Fixed by #202
Closed

Deadlock introduced in 5.x #195

donkeyteethUX opened this issue Mar 25, 2022 · 7 comments · Fixed by #202
Labels
bug Something isn't working help wanted Extra attention is needed p-critical Critical issues

Comments

@donkeyteethUX
Copy link

donkeyteethUX commented Mar 25, 2022

Upgrading from dashmap 4.x to 5.x caused my application to deadlock. Surprisingly, the deadlock is readily reproducible with a simple example, and the breaking change could be found by binary searching commits.

The following example works fine on commit 45058db and deadlocks on every commit thereafter. So apparently the issue stems from switching to parking lot for locking.

/// Crate a dashmap and fill with some junk
/// Spawn reader & writer threads. The reader holds multiple Refs at once.
pub fn main() {
    let map_1 = Arc::new(DashMap::<i32, String>::default());
    let map_2 = map_1.clone();

    for i in 0..1000 {
        map_1.insert(i, "foobar".to_string());
    }

    let _writer = std::thread::spawn(move || loop {
        println!("writer iteration");
        for i in 0..1000 {
            let mut item = map_1.get_mut(&i).unwrap();
            *item = "foobaz".to_string();
        }
    });

    let _reader = std::thread::spawn(move || loop {
        println!("reader iteration");
        for i in 0..1000 {
            let j = i32::min(i + 100, 1000);
            let _v: Vec<_> = (i..j).map(|k| map_2.get(&k)).collect();
        }
    });

    std::thread::sleep(Duration::from_secs(10));
}
@donkeyteethUX
Copy link
Author

@bsilver8192 points out that this could be a consequence of parking_lot's fair locking policy.

readers trying to acquire the lock will block even if the lock is unlocked when there are writers waiting to acquire the lock

This is potentially problematic for DashMap, since calls to get() for elements in the same shard could lead to the following:

  1. Reader acquires read lock to shard 1.
  2. Writer blocks waiting for write lock to shard 1.
  3. Reader tries to read something else from shard 1, but blocks due to fair locking.

@xacrimon xacrimon added bug Something isn't working help wanted Extra attention is needed p-critical Critical issues labels Mar 26, 2022
@riptl
Copy link

riptl commented Mar 30, 2022

I can confirm we've seen this deadlock too going from 4.0.2 to 5.2.0. If we want to fix re-entrancy, how about optionally storing read-guards on thread-local storage so they can be re-used?

@bsilver8192
Copy link

When would you release the read guards from thread-local storage?

@riptl
Copy link

riptl commented Mar 30, 2022

Reference counting maybe? 🤔 (when the number of borrows of a read guard reaches zero)

@bsilver8192
Copy link

That only solves the problem with overlapping read guard lifetimes within a single thread, but there are other situations where fair locks deadlock. For example:

  1. Thread A acquires read lock to shard 1
  2. Thread B acquires read lock to shard 2
  3. Thread W blocks acquiring write lock to shard 1 (behind A)
  4. Thread X blocks acquiring write lock to shard 2 (behind B)
  5. Thread A blocks acquiring read lock to shard 2 (behind X)
  6. Thread B blocks acquiring read lock to shard 1 (behind W)

There's a dependency cycle here: A waiting on X waiting on B waiting on W waiting on A.

@xacrimon
Copy link
Owner

Looking into this, probably going to result in switching away from parking lot sadly since it doesn't offer an unfair API. Can't promise when I can land a patch for this due to other life commitments but hopefully within a month.

@bsilver8192
Copy link

I'm not super familiar with parking_lot, but I think implementing an unfair rwlock using the parking_lot_core API would be doable. There's a lot of code involved with parking_lot::RwLock, some of which would be useful for an unfair version. I'm not sure how much of it matters for an unfair rwlock without all the fancy variants of the operations that the default one supports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed p-critical Critical issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants