Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SimpleBatcher apparent deadlock #2196 #3148

Merged
merged 5 commits into from
Feb 26, 2025

Conversation

ggivo
Copy link
Contributor

@ggivo ggivo commented Jan 31, 2025

Closes #2196

Problem:

  • Flush Operation Conflict:
      - When one thread (T1) performs a flush, it may read and dispatch batched commands before resetting the flushing flag.
      - If another thread (T2) adds a command and forced flush at this moment. Command might be added to the queue but does not trigger a flush.
      - As a result, the command remains in the queue until the next flush request, causing a delay in dispatching.

  • Flag Reset Between Iterations:
      - During a default flush operation, if multiple batches are processed, the flushing flag is reset between iterations.
      - This allows another thread to take over, potentially causing the initial thread to return BatchTasks.EMPTY instead of properly processed commands.

  1. T1 -> batch(command, CommandBatching.flush()
  2. T1 -> flushing.compareAndSet(false, true) == true
  3. T1 -> flush()->doFlush()
  4. T2 -> batch(command, CommandBatching.flush()
  5. T2 -> flushing.compareAndSet(false, true) == false  #already flushing will skip doFlush  and command remain not dispatched
  6. T1 -> batch() completes
  7. T2 -> batch() completes

Fix:

If force flush is requested while flushing, perform additional flush iteration after ongoing completes

Make sure that:

  • You have read the contribution guidelines.
  • You have created a feature request first to discuss your contribution intent. Please reference the feature request ticket number in the pull request.
  • You applied code formatting rules using the mvn formatter:format target. Don’t submit any formatting related changes.
  • You submit test cases (unit or integration tests) that back your changes.

ggivo added 2 commits January 31, 2025 16:31
### Problem:
- **Flush Operation Conflict:**
  - When one thread (T1) performs a flush, it may read and dispatch batched commands before resetting the `flushing` flag.
  - If another thread (T2) adds a command and forced flush at this moment. Command might be added to the queue but does not trigger a flush.
  - As a result, the command remains in the queue until the next flush request, causing a delay in dispatching.

- **Flag Reset Between Iterations:**
  - During a default flush operation, if multiple batches are processed, the `flushing` flag is reset between iterations.
  - This allows another thread to take over, potentially causing the initial thread to return `BatchTasks.EMPTY` instead of properly processed commands.

1. T1 -> batch(command, CommandBatching.flush()
2. T1 -> flushing.compareAndSet(false, true) == true
3. T1 -> flush()->doFlush()
4. T2 -> batch(command, CommandBatching.flush()
5. T2 -> flushing.compareAndSet(false, true) == false  #already flushing will skip doFlush  and command remain not dispatched
6. T1 -> batch() completes
7. T2 -> batch() completes

### Fix: If force flush is requested while flushing, perform additional flush iteration after ongoing completes
@ggivo ggivo self-assigned this Jan 31, 2025
@ggivo ggivo requested a review from tishun January 31, 2025 16:06
@tishun tishun added the type: bug A general bug label Feb 3, 2025
@tishun tishun added this to the 6.6.0.RELEASE milestone Feb 3, 2025
Copy link
Collaborator

@tishun tishun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My head really hurts after reading this code, but I am not sure I can simplify it any more than you have.

Perhaps we need to do some bigger refactoring around the flushing logic, but this bug is not the time.

@ggivo ggivo merged commit a127d5a into redis:main Feb 26, 2025
7 checks passed
@ggivo ggivo deleted the issue_2196_batcher_deadlock branch February 26, 2025 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SimpleBatcher apparent deadlock
2 participants