Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: compaction: no index upload scheduled if no on-demand downloads #3741

Merged
merged 1 commit into from
Mar 3, 2023

Conversation

problame
Copy link
Contributor

@problame problame commented Mar 3, 2023

Commit

0cf7fd0fb82b082d02dfadd9d6a488a7f799d72f
Compaction with on-demand download (#3598)

introduced a subtle bug: if we don't have to do on-demand downloads, we only take one ROUND in fn compact() and exit early. Thereby, we miss scheduling the index part upload for any layers created by fn compact_inner().

Before that commit, we didn't have this problem.
So, this patch fixes it.

Since no regression test caught this, I went ahead and extended the timeline size tests to assert that, if remote storage is configured,

  1. pageserver_remote_physical_size matches the other physical sizes
  2. file sizes reported by the layer map info endpoint match the other physical size metrics

Without the pageserver code fix, the regression test would fail at the physical size assertion, complaining that any of the resident physical size != remote physical size metric 50790400.0 != 18399232.0
I figured out what the problem is by comparing the remote storage and local directories like so, and noticed that the image layer in the local directory wasn't present on the remote side. It's size was exactly the difference
50790400.0 - 18399232.0 =32391168.0

fixes #3738

@problame problame requested review from a team as code owners March 3, 2023 11:19
@problame problame requested a review from LizardWizzard March 3, 2023 12:10
Copy link
Contributor

@lubennikovaav lubennikovaav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find
Thank you for fixing it.

@problame problame changed the base branch from main to problame/use-parse_metrics-everywhere March 3, 2023 13:03
@problame problame requested a review from a team as a code owner March 3, 2023 13:03
@problame problame requested review from petuhovskiy and removed request for a team March 3, 2023 13:03
@problame problame force-pushed the problame/iss-3738-remote-physical-size-lags-behind branch from 054726a to 29b776f Compare March 3, 2023 13:03
Base automatically changed from problame/use-parse_metrics-everywhere to main March 3, 2023 13:53
@problame problame force-pushed the problame/iss-3738-remote-physical-size-lags-behind branch from 29b776f to a97fd9c Compare March 3, 2023 13:55
@problame problame enabled auto-merge (rebase) March 3, 2023 13:55
@problame problame disabled auto-merge March 3, 2023 14:04
Commit

    0cf7fd0
    Compaction with on-demand download (#3598)

introduced a subtle bug: if we don't have to do on-demand downloads,
we only take one ROUND in fn compact() and exit early.
Thereby, we miss scheduling the index part upload for any layers
created by fn compact_inner().

Before that commit, we didn't have this problem.
So, this patch fixes it.

Since no regression test caught this, I went ahead and extended the
timeline size tests to assert that, if remote storage is configured,
1. pageserver_remote_physical_size matches the other physical sizes
2. file sizes reported by the layer map info endpoint match the other
   physical size metrics

Without the pageserver code fix, the regression test would
fail at the physical size assertion, complaining that
any of the resident physical size != remote physical size metric
50790400.0 != 18399232.0
I figured out what the problem is by comparing the remote storage
and local directories like so, and noticed that the image layer
in the local directory wasn't present on the remote side.
It's size was exactly the difference
    50790400.0 - 18399232.0  =32391168.0

fixes #3738
@problame problame force-pushed the problame/iss-3738-remote-physical-size-lags-behind branch from a97fd9c to 1ec0fd7 Compare March 3, 2023 14:11
@problame
Copy link
Contributor Author

problame commented Mar 3, 2023

Thanks for the Python lesson, @vadim2404 , learning something new each time :)

@problame problame mentioned this pull request Mar 3, 2023
Copy link
Contributor

@LizardWizzard LizardWizzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@problame problame merged commit 66a5159 into main Mar 3, 2023
@problame problame deleted the problame/iss-3738-remote-physical-size-lags-behind branch March 3, 2023 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver_remote_physical_size metric lags behind
4 participants