Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ondemand_download_large_rel: solve flakyness #3697

Merged
merged 4 commits into from
May 17, 2023

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented Feb 23, 2023

Disable background tasks to not get compaction downloading all layers but also stop safekeepers before checkpointing, use a readonly endpoint.

Fixes: #3666

@koivunej
Copy link
Member Author

koivunej commented Feb 23, 2023

Not sure if this should actually fix this, since here it looks like the basebackup downloaded all of the layers, even if they were compacted? https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3696/debug/4252834876/index.html#suites/b97efae3a617afb71cb8142f5afa5224/2486b335203eff3a/

Could it be that we cannot say that there's certainly some shuffling after this test workload where starting up the compute will require less than full amount of layers to be downloaded?

@koivunej
Copy link
Member Author

koivunej commented Feb 23, 2023

Nevermind, probably misread that log, there's of course the initial size calculation which we are racing against as well, so earlier question rephrased:

Are we correct in expecting that there is a compaction result which will allow us to have less downloaded by basebackup or basebackup+initial_size_calculation than the query?

Also, wondering if the compaction upload could go over timeout with real_s3, it will be racing against 10s termination timeout -- not the 20s wait_for_upload because lsn doesn't change and wait_for_upload cannot see if compacted layers or image layers are still being uploaded. This effect has been hit on the same issue, also I think this is what for timetravel got a sleep in #3616.

Just hit the 10s timeout on #3696: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3696/release/4253557268/index.html#suites/b97efae3a617afb71cb8142f5afa5224/d1c6ca3a23b97680/

@koivunej koivunej force-pushed the disable_gc_in_test_expecting_no_gc branch from f53bc6f to 9a65051 Compare March 7, 2023 10:53
@koivunej
Copy link
Member Author

koivunej commented Mar 7, 2023

Rebased...

@koivunej

This comment was marked as off-topic.

@koivunej
Copy link
Member Author

koivunej commented Mar 7, 2023

Also, wondering if the compaction upload could go over timeout with real_s3, it will be racing against 10s termination timeout -- not the 20s wait_for_upload because lsn doesn't change and wait_for_upload cannot see if compacted layers or image layers are still being uploaded.

I should probably try the wait_for_upload_queue_empty here added in #3741 also to the timetravel test case which got the sleep added in earlier... Though, it's not enough that we await for queued to complete, we want to first await for new work to appear and then to await those to complete. If that makes sense.

@koivunej
Copy link
Member Author

This probably needs some feedback and ideas per my above comments.

@koivunej koivunej force-pushed the disable_gc_in_test_expecting_no_gc branch from 9a65051 to 8c81443 Compare May 17, 2023 08:27
@koivunej
Copy link
Member Author

Let's downscope the general failure to await for uploads in case we still have pending gc/compact out of this PR, rebased, should be good now.

@github-actions
Copy link

github-actions bot commented May 17, 2023

992 tests run: 946 passed, 0 failed, 46 skipped (full report)


Flaky tests (1)

Postgres 14

  • test_ondemand_download_timetravel[real_s3]: ✅ debug
The comment gets automatically updated with the latest test results
c7c5f81 at 2023-05-17T13:20:22.957Z :recycle:

@koivunej koivunej force-pushed the disable_gc_in_test_expecting_no_gc branch from a64a4fb to 821014a Compare May 17, 2023 12:55
this might cover the last of flakyness.

Co-authored-by: Christian Schwarz <[email protected]>
@koivunej koivunej force-pushed the disable_gc_in_test_expecting_no_gc branch from 821014a to c7c5f81 Compare May 17, 2023 12:57
@koivunej koivunej changed the title ondemand_download_large_rel: disable gc and compaction on restart ondemand_download_large_rel: solve flakyness May 17, 2023
@koivunej koivunej requested a review from problame May 17, 2023 12:58
@koivunej koivunej enabled auto-merge (squash) May 17, 2023 13:11
@koivunej koivunej merged commit 918cd25 into main May 17, 2023
@koivunej koivunej deleted the disable_gc_in_test_expecting_no_gc branch May 17, 2023 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_ondemand_download::test_ondemand_download_large_rel flakyness
3 participants