Skip to content

Commit e461f08

Browse files
kevans91lunnylafriks
authored
[RFC] Make archival asynchronous (#11296)
* Make archival asynchronous The prime benefit being sought here is for large archives to not clog up the rendering process and cause unsightly proxy timeouts. As a secondary benefit, archive-in-progress is moved out of the way into a /tmp file so that new archival requests for the same commit will not get fulfilled based on an archive that isn't yet finished. This asynchronous system is fairly primitive; request comes in, we'll spawn off a new goroutine to handle it, then we'll mark it as done. Status requests will see if the file exists in the final location, and report the archival as done when it exists. Fixes #11265 * Archive links: drop initial delay to three-quarters of a second Some, or perhaps even most, archives will not take all that long to archive. The archive process starts as soon as the download button is initially clicked, so in theory they could be done quite quickly. Drop the initial delay down to three-quarters of a second to make it more responsive in the common case of the archive being quickly created. * archiver: restructure a little bit to facilitate testing This introduces two sync.Cond pointers to the archiver package. If they're non-nil when we go to process a request, we'll wait until signalled (at all) to proceed. The tests will then create the sync.Cond so that it can signal at-will and sanity-check the state of the queue at different phases. The author believes that nil-checking these two sync.Cond pointers on every archive processing will introduce minimal overhead with no impact on maintainability. * gofmt nit: no space around binary + operator * services: archiver: appease golangci-lint, lock queueMutex Locking/unlocking the queueMutex is allowed, but not required, for Cond.Signal() and Cond.Broadcast(). The magic at play here is just a little too much for golangci-lint, as we take the address of queueMutex and this is mostly used in archiver.go; the variable still gets flagged as unused. * archiver: tests: fix several timing nits Once we've signaled a cond var, it may take some small amount of time for the goroutines released to hit the spot we're wanting them to be at. Give them an appropriate amount of time. * archiver: tests: no underscore in var name, ungh * archiver: tests: Test* is run in a separate context than TestMain We must setup the mutex/cond variables at the beginning of any test that's going to use it, or else these will be nil when the test is actually ran. * archiver: tests: hopefully final tweak Things got shuffled around such that we carefully build up and release requests from the queue, so we can validate the state of the queue at each step. Fix some assertions that no longer hold true as fallout. * repo: Download: restore some semblance of previous behavior When archival was made async, the GET endpoint was only useful if a previous POST had initiated the download. This commit restores the previous behavior, to an extent; we'll now submit the archive request there and return a "202 Accepted" to indicate that it's processing if we didn't manage to complete the request within ~2 seconds of submission. This lets a client directly GET the archive, and gives them some indication that they may attempt to GET it again at a later time. * archiver: tests: simplify a bit further We don't need to risk failure and use time.ParseDuration to get 2 * time.Second. else if isn't really necessary if the conditions are simple enough and lead to the same result. * archiver: tests: resolve potential source of flakiness Increase all timeouts to 10 seconds; these aren't hard-coded sleeps, so there's no guarantee we'll actually take that long. If we need longer to not have a false-positive, then so be it. While here, various assert.{Not,}Equal arguments are flipped around so that the wording in error output reflects reality, where the expected argument is second and actual third. * archiver: setup infrastructure for notifying consumers of completion This API will *not* allow consumers to subscribe to specific requests being completed, just *any* request being completed. The caller is responsible for determining if their request is satisfied and waiting again if needed. * repo: archive: make GET endpoint synchronous again If the request isn't complete, this endpoint will now submit the request and wait for completion using the new API. This may still be susceptible to timeouts for larger repos, but other endpoints now exist that the web interface will use to negotiate its way through larger archive processes. * archiver: tests: amend test to include WaitForCompletion() This is a trivial one, so go ahead and include it. * archiver: tests: fix test by calling NewContext() The mutex is otherwise uninitialized, so we need to ensure that we're actually initializing it if we plan to test it. * archiver: tests: integrate new WaitForCompletion a little better We can use this to wait for archives to come in, rather than spinning and hoping with a timeout. * archiver: tests: combine numQueued declaration with next-instruction assignment * routers: repo: reap unused archiving flag from DownloadStatus() This had some planned usage before, indicating whether this request initiated the archival process or not. After several rounds of refactoring, this use was deemed not necessary for much of anything and got boiled down to !complete in all cases. * services: archiver: restructure to use a channel We now offer two forms of waiting for a request: - WaitForCompletion: wait for completion with no timeout - TimedWaitForCompletion: wait for completion with timeout In both cases, we wait for the given request's cchan to close; in the latter case, we do so with the caller-provided timeout. This completely removes the need for busy-wait loops in Download/InitiateDownload, as it's fairly clean to wait on a channel with timeout. * services: archiver: use defer to unlock now that we can This previously carried the lock into the goroutine, but an intermediate step just added the request to archiveInProgress outside of the new goroutine and removed the need for the goroutine to start out with it. * Revert "archiver: tests: combine numQueued declaration with next-instruction assignment" This reverts commit bcc5214. Revert "archiver: tests: integrate new WaitForCompletion a little better" This reverts commit 9fc8bed. Revert "archiver: tests: fix test by calling NewContext()" This reverts commit 709c356. Revert "archiver: tests: amend test to include WaitForCompletion()" This reverts commit 75261f5. * archiver: tests: first attempt at WaitForCompletion() tests * archiver: tests: slight improvement, less busy-loop Just wait for the requests to complete in order, instead of busy-waiting with a timeout. This is slightly less fragile. While here, reverse the arguments of a nearby assert.Equal() so that expected/actual are correct in any test output. * archiver: address lint nits * services: archiver: only close the channel once * services: archiver: use a struct{} for the wait channel This makes it obvious that the channel is only being used as a signal, rather than anything useful being piped through it. * archiver: tests: fix expectations Move the close of the channel into doArchive() itself; notably, before these goroutines move on to waiting on the Release cond. The tests are adjusted to reflect that we can't WaitForCompletion() after they've already completed, as WaitForCompletion() doesn't indicate that they've been released from the queue yet. * archiver: tests: set cchan to nil for comparison * archiver: move ctx.Error's back into the route handlers We shouldn't be setting this in a service, we should just be validating the request that we were handed. * services: archiver: use regex to match a hash This makes sure we don't try and use refName as a hash when it's clearly not one, e.g. heads/pull/foo. * routers: repo: remove the weird /archive/status endpoint We don't need to do this anymore, we can just continue POSTing to the archive/* endpoint until we're told the download's complete. This avoids a potential naming conflict, where a ref could start with "status/" * archiver: tests: bump reasonable timeout to 15s * archiver: tests: actually release timedReq * archiver: tests: run through inFlight instead of manually checking While we're here, add a test for manually re-processing an archive that's already been complete. Re-open the channel and mark it incomplete, so that doArchive can just mark it complete again. * initArchiveLinks: prevent default behavior from clicking * archiver: alias gitea's context, golang context import pending * archiver: simplify logic, just reconstruct slices While the previous logic was perhaps slightly more efficient, the new variant's readability is much improved. * archiver: don't block shutdown on waiting for archive The technique established launches a goroutine to do the wait, which will close a wait channel upon termination. For the timeout case, we also send back a value indicating whether the timeout was hit or not. The timeouts are expected to be relatively small, but still a multi- second delay to shutdown due to this could be unfortunate. * archiver: simplify shutdown logic We can just grab the shutdown channel from the graceful manager instead of constructing a channel to halt the caller and/or pass a result back. * Style issues * Fix mis-merge Co-authored-by: Lunny Xiao <[email protected]> Co-authored-by: Lauris BH <[email protected]>
1 parent 1b65536 commit e461f08

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+1356
-85
lines changed

integrations/api_repo_test.go

+3-3
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,9 @@ func TestAPISearchRepo(t *testing.T) {
7777
expectedResults
7878
}{
7979
{name: "RepositoriesMax50", requestURL: "/api/v1/repos/search?limit=50&private=false", expectedResults: expectedResults{
80-
nil: {count: 27},
81-
user: {count: 27},
82-
user2: {count: 27}},
80+
nil: {count: 28},
81+
user: {count: 28},
82+
user2: {count: 28}},
8383
},
8484
{name: "RepositoriesMax10", requestURL: "/api/v1/repos/search?limit=10&private=false", expectedResults: expectedResults{
8585
nil: {count: 10},
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ref: refs/heads/master
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[core]
2+
repositoryformatversion = 0
3+
filemode = false
4+
bare = true
5+
symlinks = false
6+
ignorecase = true
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Unnamed repository; edit this file 'description' to name the repository.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/bin/sh
2+
#
3+
# An example hook script to check the commit log message taken by
4+
# applypatch from an e-mail message.
5+
#
6+
# The hook should exit with non-zero status after issuing an
7+
# appropriate message if it wants to stop the commit. The hook is
8+
# allowed to edit the commit message file.
9+
#
10+
# To enable this hook, rename this file to "applypatch-msg".
11+
12+
. git-sh-setup
13+
commitmsg="$(git rev-parse --git-path hooks/commit-msg)"
14+
test -x "$commitmsg" && exec "$commitmsg" ${1+"$@"}
15+
:
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/bin/sh
2+
#
3+
# An example hook script to check the commit log message.
4+
# Called by "git commit" with one argument, the name of the file
5+
# that has the commit message. The hook should exit with non-zero
6+
# status after issuing an appropriate message if it wants to stop the
7+
# commit. The hook is allowed to edit the commit message file.
8+
#
9+
# To enable this hook, rename this file to "commit-msg".
10+
11+
# Uncomment the below to add a Signed-off-by line to the message.
12+
# Doing this in a hook is a bad idea in general, but the prepare-commit-msg
13+
# hook is more suited to it.
14+
#
15+
# SOB=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
16+
# grep -qs "^$SOB" "$1" || echo "$SOB" >> "$1"
17+
18+
# This example catches duplicate Signed-off-by lines.
19+
20+
test "" = "$(grep '^Signed-off-by: ' "$1" |
21+
sort | uniq -c | sed -e '/^[ ]*1[ ]/d')" || {
22+
echo >&2 Duplicate Signed-off-by lines.
23+
exit 1
24+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
#!/usr/bin/perl
2+
3+
use strict;
4+
use warnings;
5+
use IPC::Open2;
6+
7+
# An example hook script to integrate Watchman
8+
# (https://facebook.github.io/watchman/) with git to speed up detecting
9+
# new and modified files.
10+
#
11+
# The hook is passed a version (currently 1) and a time in nanoseconds
12+
# formatted as a string and outputs to stdout all files that have been
13+
# modified since the given time. Paths must be relative to the root of
14+
# the working tree and separated by a single NUL.
15+
#
16+
# To enable this hook, rename this file to "query-watchman" and set
17+
# 'git config core.fsmonitor .git/hooks/query-watchman'
18+
#
19+
my ($version, $time) = @ARGV;
20+
21+
# Check the hook interface version
22+
23+
if ($version == 1) {
24+
# convert nanoseconds to seconds
25+
$time = int $time / 1000000000;
26+
} else {
27+
die "Unsupported query-fsmonitor hook version '$version'.\n" .
28+
"Falling back to scanning...\n";
29+
}
30+
31+
my $git_work_tree;
32+
if ($^O =~ 'msys' || $^O =~ 'cygwin') {
33+
$git_work_tree = Win32::GetCwd();
34+
$git_work_tree =~ tr/\\/\//;
35+
} else {
36+
require Cwd;
37+
$git_work_tree = Cwd::cwd();
38+
}
39+
40+
my $retry = 1;
41+
42+
launch_watchman();
43+
44+
sub launch_watchman {
45+
46+
my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
47+
or die "open2() failed: $!\n" .
48+
"Falling back to scanning...\n";
49+
50+
# In the query expression below we're asking for names of files that
51+
# changed since $time but were not transient (ie created after
52+
# $time but no longer exist).
53+
#
54+
# To accomplish this, we're using the "since" generator to use the
55+
# recency index to select candidate nodes and "fields" to limit the
56+
# output to file names only. Then we're using the "expression" term to
57+
# further constrain the results.
58+
#
59+
# The category of transient files that we want to ignore will have a
60+
# creation clock (cclock) newer than $time_t value and will also not
61+
# currently exist.
62+
63+
my $query = <<" END";
64+
["query", "$git_work_tree", {
65+
"since": $time,
66+
"fields": ["name"],
67+
"expression": ["not", ["allof", ["since", $time, "cclock"], ["not", "exists"]]]
68+
}]
69+
END
70+
71+
print CHLD_IN $query;
72+
close CHLD_IN;
73+
my $response = do {local $/; <CHLD_OUT>};
74+
75+
die "Watchman: command returned no output.\n" .
76+
"Falling back to scanning...\n" if $response eq "";
77+
die "Watchman: command returned invalid output: $response\n" .
78+
"Falling back to scanning...\n" unless $response =~ /^\{/;
79+
80+
my $json_pkg;
81+
eval {
82+
require JSON::XS;
83+
$json_pkg = "JSON::XS";
84+
1;
85+
} or do {
86+
require JSON::PP;
87+
$json_pkg = "JSON::PP";
88+
};
89+
90+
my $o = $json_pkg->new->utf8->decode($response);
91+
92+
if ($retry > 0 and $o->{error} and $o->{error} =~ m/unable to resolve root .* directory (.*) is not watched/) {
93+
print STDERR "Adding '$git_work_tree' to watchman's watch list.\n";
94+
$retry--;
95+
qx/watchman watch "$git_work_tree"/;
96+
die "Failed to make watchman watch '$git_work_tree'.\n" .
97+
"Falling back to scanning...\n" if $? != 0;
98+
99+
# Watchman will always return all files on the first query so
100+
# return the fast "everything is dirty" flag to git and do the
101+
# Watchman query just to get it over with now so we won't pay
102+
# the cost in git to look up each individual file.
103+
print "/\0";
104+
eval { launch_watchman() };
105+
exit 0;
106+
}
107+
108+
die "Watchman: $o->{error}.\n" .
109+
"Falling back to scanning...\n" if $o->{error};
110+
111+
binmode STDOUT, ":utf8";
112+
local $, = "\0";
113+
print @{$o->{files}};
114+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/usr/bin/env bash
2+
data=$(cat)
3+
exitcodes=""
4+
hookname=$(basename $0)
5+
GIT_DIR=${GIT_DIR:-$(dirname $0)}
6+
7+
for hook in ${GIT_DIR}/hooks/${hookname}.d/*; do
8+
test -x "${hook}" && test -f "${hook}" || continue
9+
echo "${data}" | "${hook}"
10+
exitcodes="${exitcodes} $?"
11+
done
12+
13+
for i in ${exitcodes}; do
14+
[ ${i} -eq 0 ] || exit ${i}
15+
done
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
#!/usr/bin/env bash
2+
"$GITEA_ROOT/gitea" hook --config="$GITEA_ROOT/$GITEA_CONF" post-receive
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/bin/sh
2+
#
3+
# An example hook script to prepare a packed repository for use over
4+
# dumb transports.
5+
#
6+
# To enable this hook, rename this file to "post-update".
7+
8+
exec git update-server-info
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/sh
2+
#
3+
# An example hook script to verify what is about to be committed
4+
# by applypatch from an e-mail message.
5+
#
6+
# The hook should exit with non-zero status after issuing an
7+
# appropriate message if it wants to stop the commit.
8+
#
9+
# To enable this hook, rename this file to "pre-applypatch".
10+
11+
. git-sh-setup
12+
precommit="$(git rev-parse --git-path hooks/pre-commit)"
13+
test -x "$precommit" && exec "$precommit" ${1+"$@"}
14+
:
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
#!/bin/sh
2+
#
3+
# An example hook script to verify what is about to be committed.
4+
# Called by "git commit" with no arguments. The hook should
5+
# exit with non-zero status after issuing an appropriate message if
6+
# it wants to stop the commit.
7+
#
8+
# To enable this hook, rename this file to "pre-commit".
9+
10+
if git rev-parse --verify HEAD >/dev/null 2>&1
11+
then
12+
against=HEAD
13+
else
14+
# Initial commit: diff against an empty tree object
15+
against=$(git hash-object -t tree /dev/null)
16+
fi
17+
18+
# If you want to allow non-ASCII filenames set this variable to true.
19+
allownonascii=$(git config --bool hooks.allownonascii)
20+
21+
# Redirect output to stderr.
22+
exec 1>&2
23+
24+
# Cross platform projects tend to avoid non-ASCII filenames; prevent
25+
# them from being added to the repository. We exploit the fact that the
26+
# printable range starts at the space character and ends with tilde.
27+
if [ "$allownonascii" != "true" ] &&
28+
# Note that the use of brackets around a tr range is ok here, (it's
29+
# even required, for portability to Solaris 10's /usr/bin/tr), since
30+
# the square bracket bytes happen to fall in the designated range.
31+
test $(git diff --cached --name-only --diff-filter=A -z $against |
32+
LC_ALL=C tr -d '[ -~]\0' | wc -c) != 0
33+
then
34+
cat <<\EOF
35+
Error: Attempt to add a non-ASCII file name.
36+
37+
This can cause problems if you want to work with people on other platforms.
38+
39+
To be portable it is advisable to rename the file.
40+
41+
If you know what you are doing you can disable this check using:
42+
43+
git config hooks.allownonascii true
44+
EOF
45+
exit 1
46+
fi
47+
48+
# If there are whitespace errors, print the offending file names and fail.
49+
exec git diff-index --check --cached $against --
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#!/bin/sh
2+
3+
# An example hook script to verify what is about to be pushed. Called by "git
4+
# push" after it has checked the remote status, but before anything has been
5+
# pushed. If this script exits with a non-zero status nothing will be pushed.
6+
#
7+
# This hook is called with the following parameters:
8+
#
9+
# $1 -- Name of the remote to which the push is being done
10+
# $2 -- URL to which the push is being done
11+
#
12+
# If pushing without using a named remote those arguments will be equal.
13+
#
14+
# Information about the commits which are being pushed is supplied as lines to
15+
# the standard input in the form:
16+
#
17+
# <local ref> <local sha1> <remote ref> <remote sha1>
18+
#
19+
# This sample shows how to prevent push of commits where the log message starts
20+
# with "WIP" (work in progress).
21+
22+
remote="$1"
23+
url="$2"
24+
25+
z40=0000000000000000000000000000000000000000
26+
27+
while read local_ref local_sha remote_ref remote_sha
28+
do
29+
if [ "$local_sha" = $z40 ]
30+
then
31+
# Handle delete
32+
:
33+
else
34+
if [ "$remote_sha" = $z40 ]
35+
then
36+
# New branch, examine all commits
37+
range="$local_sha"
38+
else
39+
# Update to existing branch, examine new commits
40+
range="$remote_sha..$local_sha"
41+
fi
42+
43+
# Check for WIP commit
44+
commit=`git rev-list -n 1 --grep '^WIP' "$range"`
45+
if [ -n "$commit" ]
46+
then
47+
echo >&2 "Found WIP commit in $local_ref, not pushing"
48+
exit 1
49+
fi
50+
fi
51+
done
52+
53+
exit 0

0 commit comments

Comments
 (0)