Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Severe performance degradation in Nessie v.0.102.3 #10536

Open
bigluck opened this issue Mar 13, 2025 · 0 comments
Open

[Bug]: Severe performance degradation in Nessie v.0.102.3 #10536

bigluck opened this issue Mar 13, 2025 · 0 comments

Comments

@bigluck
Copy link

bigluck commented Mar 13, 2025

What happened

We are experiencing critical performance issues with Nessie v.0.102.3, specifically with the /v2/trees endpoint when retrieving branch listings. In severe cases, these performance problems are causing server failures.

  • Nessie Version: v.0.102.3
  • DB Repo Id 3
  • Data Volume:
    • 7 branches
    • 38k tags
    • 38,312 elements in the refs table for this repo_id
    • 120,000 total refs on the development RDS instance

Following the support team's recommendation, we disabled option NESSIE.VERSION.STORE.PERSIST.CACHE-ENABLE-SOFT-REFERENCES on Wednesday, March 12 ( #10526 ). While this change has improved API stability, it has also significantly increased response times.

Image

Current Behavior

  • Response times for the /trees endpoint range from 4-5 seconds for retrieving just 7 branches
  • Before disabling that option, response times were still high at 2-3 seconds
  • The server occasionally crashes under this load

Expected Behavior

The /trees endpoint should return results in milliseconds for such a small result set (7 branches), regardless of the total number of refs/tags in the system, expectially for a simple http request like this one:

/api/v2/trees?fetch=MINIMAL&filter=refType+%3D%3D+%22BRANCH%22&max-records=500

We suspect that Nessie is not applying the appropriate database-level filters. It appears to be iterating through the entire collection of 40,000+ objects and filtering them in-memory rather than using efficient database queries.

How to reproduce it

IDK

Nessie server type (docker/uber-jar/built from source) and version

docker

Client type (Ex: UI/Spark/pynessie ...) and version

No response

Additional information

Image

This is the trace from a request that took 4 seconds to generate a response; it contains:

  • 632 ObservingPersist.fetchReferences
  • 31354 ObservingPersist.fetchTypedObj
  • 638 ObservingPersist.fetchReference
  1. Why does a query to retrieve only 7 branches take 4-5 seconds to execute?
  2. Is there a configuration setting or optimization that would allow Nessie to properly filter at the database level?
  3. Are there any known issues with the /trees endpoint when dealing with repositories that have a large number of refs in the db?

This performance issue is significantly affecting our dev and prod systems. The excessive response times and occasional server failures are unfortunately blocking critical operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant