Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lettuce uses experimental io_uring transport if it is present on the classpath #3222

Open
gmethvin opened this issue Mar 20, 2025 · 7 comments

Comments

@gmethvin
Copy link

gmethvin commented Mar 20, 2025

Bug Report

Lettuce automatically uses the Netty incubator io_uring transport if it is available on the classpath, even though this support is considered experimental. This is a problem as this transport has a bug causing high iowait CPU usage, which results in application timeouts and other issues. If the support is considered experimental—as it should be given this bug—there should be some explicit configuration required in order to use this transport.

Current Behavior

See netty/netty-incubator-transport-io_uring#247

Input Code

No code is required because Lettuce automatically uses the transport if it is added explicitly or pulled in by another library.

Expected behavior/code

Lettuce should default to the epoll transport on Linux, which is production-ready and does not have the above-mentioned bug.

Environment

  • Lettuce version(s): since 6.1.0.M1
  • Redis version: N/A

Possible Solution

Change the default for the io.lettuce.core.iouring system property to false.

Additional context

In our case, the transport was pulled in by the zio-http library. This library includes the io_uring transport on the classpath but requires explicit configuration to use it, defaulting to epoll. Arguably zio-http should not include this dependency by default, and we have opened an issue there as well, but still I think Lettuce is at fault for using an experimental transport simply because it is available on the classpath.

I'm aware that it's common practice among Netty libraries to detect which transports are available and automatically choose the "best" one, but I'm not aware of any other popular library that defaults to io_uring.

@tishun
Copy link
Collaborator

tishun commented Mar 24, 2025

Background

The support for io_uring was added as part of #1522 as part of 6.1.x in Feb 2021
There is no information why the io_uring takes precedence compared to epoll
@mp911de this was a long time ago, but if you have some recollection please let me know.

Current situation

Easily reproducible, if both libraries are present in the class path the io_uring is used.
At this point I have to agree that - given the current status of the io_uring support - we should instead default to epoll.

Suggested approach

As this was never part of a contract, at least I was not able to find one, I will consider this an internal behaviour.
Choosing io_uring when both packages are available is more of a side effect than a premeditated decision, as having them both available seems a wrong configuration to start with (as suggested by @gmethvin).

Otherwise we would have to consider it a breaking change.

@mp911de
Copy link
Collaborator

mp911de commented Mar 24, 2025

In many Linux arrangements it is common to use epoll because of its improved performance profile.

uring isn’t that common and no one should add uring by accident. I consider pulling in that library transitively a bug (except if there’s a dedicated artifact). So having epoll on the class path would be a rather normal arrangement. Uring should only come into play in very specific scenarios and it then should take precedence. Conflict resolution is up to the runtime by using config properties.

Updating a library default for one specific case whose origin is a buggy other library (requiring something that might not be used) is neither a proper motivation nor good stewardship in terms of compatibility.

I/O choice isn’t something that immediately breaks in tests but a rather silent change in production that only gets visible in certain situations (I.e. when there is already high load and the underlying transport choice breaks the previous behavior).

That being said, please use configuration for what is is supposed to be used instead of changing defaults for all users.

Apologies if that sounds rude, however I have a strong opinion on breaking and on requiring things.

@gmethvin
Copy link
Author

I completely agree with the sentiment: library maintainers should be conservative when it comes to changing defaults.

From a library stewardship perspective, the main problem is that Lettuce updated the default transport to an experimental library—which is pretty clearly a breaking change—and it was done in a minor version with no documentation that the default was changing. The documentation only says that "Lettuce defaults to native transports if the appropriate library is available within its runtime" and does not indicate which one is used. I think a reasonable person would expect either the non-experimental transport to be used by default, or to have to explicitly select the transport somehow, perhaps via the system properties.

Just like any other third-party library, the io_uring transport might be on the classpath for some other reason unrelated to Lettuce. Clearly there are other Netty libraries that support io_uring, and the user might be intentionally using it with one of those libraries, but it's not safe to assume that the user wants to use it with Lettuce just because they're using it somewhere else.

One solution is to require that io.lettuce.core.iouring is set to either false or true if the io_uring transport is on the classpath, and otherwise throw an exception upon initialization. This way, someone upgrading to the new version of Lettuce won’t be surprised by the transport silently changing. While this is a breaking change, I'd argue it's completely reasonable because io_uring support is experimental.

@tishun
Copy link
Collaborator

tishun commented Mar 25, 2025

@mp911de, your input is very valuable, thank you for sharing, this is mainly why I tagged you here.

@gmethvin that is not to say that I don't sympathize with your pain, opening an issue meant you spent some time dealing with this and would like to make it easier for others. Any user of the zio-http lib that also uses Lettuce is bound to hit the same issue, and there might be many other similar situations.

At this point my personal opinion is that there is a chance - albeit small - that we cause some serious change in behaviour for some of the library consumers if we change the defaults. Even more so - this change might not manifest in an obvious way and could just make some other unfortunate engineer spend nights trying to understand what happened.

My suggestion is:

  • extend the documentation to clearly state how the driver currently works
  • provide a trace in the log that a "conflict" exists (two native libraries are present)
  • ... and how it was resolved (defaulting to io_uring)
  • consider changing this for 7.0.x (or some next major release) as a breaking change, but not before that

Does this seem fair to everyone?

@mp911de
Copy link
Collaborator

mp911de commented Mar 25, 2025

Release notes do call out uring support. Minor releases are feature releases, and adding something that wasn't there before that also requires explicit opt-in by adding a dependency isn't a breaking change.

I agree with @tishun's first three suggestions, specifically documentation to remove the surprise moment. I should have had done that much earlier.

With uring being experimental I expect some changes in either direction once the transport isn't optional anymore (either discarded or being promoted to proper).

I like the suggestion to be more explicit, that is, if a transport property is set to true, to fail if the transport is not available. That could become a general theme and with that, justifying the switch in defaults becomes easier.

@gmethvin
Copy link
Author

At this point my personal opinion is that there is a chance - albeit small - that we cause some serious change in behaviour for some of the library consumers if we change the defaults. Even more so - this change might not manifest in an obvious way and could just make some other unfortunate engineer spend nights trying to understand what happened.

I agree with that. This is what happened to us as a result of the change in defaults in 6.1. So there is some group of users who should be opting out but aren’t aware they need to do that. But now another group of users expect it to be enabled by default, so you don’t want to silently change the behavior again, even if the original breaking change was unjustified.

I still think the best solution is to require users to explicitly indicate their preference. That should resolve any ambiguity and allows you to provide an error message explaining the situation.

extend the documentation to clearly state how the driver currently works

Definitely a good idea to clarify the docs, but I doubt that would've helped us because it was hard to tell initially that the iowait issue was coming from Lettuce.

@gmethvin
Copy link
Author

Basically I'm suggesting that instead of

provide a trace in the log that a "conflict" exists (two native libraries are present)

which realistically few people will ever notice in production, you instead throw an exception and refuse to create the client until the user has resolved the conflict somehow, either by removing a library, explicitly configuring which one to use via system properties, or through some other configuration mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants