Skip to content

cmd/go: add per-test timeouts #48157

Open
@rsc

Description

@rsc
Contributor

Tests have an overall timeout for the entire binary but no timeout for a specific test case.
You often want to limit any particular test case to a time much shorter than the overall binary.
I propose to add the concept of per-test (function) timeouts to the go command user experience as follows.

  1. Each test gets a per-test timeout. The timer for a given test only ticks down when the test is running. It does not tick down when the test is blocked in t.Parallel, nor when it is blocked in t.Run running a subtest.

  2. The default per-test case timeout is 1m (one minute). If the new -testtimeout flag is specified explicitly, then that sets a different default. If the -testtimeout flag is omitted but -timeout is specified explicitly, then that sets the default too. This way, if you have one really long test and use go test -timeout=30m, the per-case timeout doesn't kick in after 1 minute and kill it anyway.

  3. There is a new testing.TB method SetTimeout(d time.Duration) that allows a test to set its own timeout. Calling SetTimeout does not reset the timer. If a test runs for 30 seconds and then calls t.SetTimeout(1*time.Second), it gets killed for having timed out. A timeout set this way is inherited by subtests. (They each have their own timer.)

  4. When a test timeout happens, the whole process still gets killed. There's nothing we can really do about that. But the failure does say which test function timed out.

This change will help users generally, and it will also help fuzzing, because any individual invocation of a fuzz function will now have a testing.T with a 1-minute timer, providing for detection of inputs that cause infinite loops or very slow execution.

Activity

changed the title [-]cmd/go: add per-test timeouts[/-] [+]proposal: cmd/go: add per-test timeouts[/+] on Sep 2, 2021
added this to the Proposal milestone on Sep 2, 2021
bcmills

bcmills commented on Sep 2, 2021

@bcmills
Contributor

When a test timeout happens, the whole process still gets killed. There's nothing we can really do about that. But the failure does say which test function timed out.

I suspect this would help (at least somewhat) with the use-cases motivating #39038, too.

cespare

cespare commented on Sep 2, 2021

@cespare
Contributor

This sounds great. Besides being an overall better experience when there's a hanging test, it may allow for simpler test code in some situations. Sometimes I've written tests which exercise concurrent code which, if it fails, is expected to hang/deadlock; then I spawn my own timer to fail the test if it hasn't completed after, say, 5s. Depending on whether the kill-the-whole-test behavior is acceptable, I could change these to simply use t.SetTimeout(5 * time.Second) instead.

cespare

cespare commented on Sep 2, 2021

@cespare
Contributor

Would there still be a 10m default -timeout?

I would prefer there not to be, since hitting that timeout without hitting a 1m per-test timeout seems like a signal that we have a giant test suite operating correctly, not that we have a hanging/broken test.

mvdan

mvdan commented on Sep 3, 2021

@mvdan
Member

So, as per my reading, -testtimeout or -timeout just set the per-test default timeout, so T.SetTimeout always takes precedence. Is that correct?

I think that's the most intuitive to the test author. If someone wants to constrain the overall run-time of tests, they'd use the global -timeout, presumably.

  1. If the -testtimeout flag is omitted but -timeout is specified explicitly, then that sets the default too.

I understand the reasoning behind this, but I'm also slightly uneasy about it. Many CIs currently use lines like go test -timeout=30m ./...; that means they would get zero net benefit from this proposal, unless they noticed they'd need to change to go test -timeout=30m -testtimeout=1m ./....

  1. When a test timeout happens, the whole process still gets killed. There's nothing we can really do about that.

Could you expand on the reason why we can't do better?

As a separate point; would https://pkg.go.dev/testing#T.Deadline be updated to take into account the timeout of the running test too? I assume it would be, but just clarifying since this proposal doesn't say.

Overall I really like this idea, for the reasons Bryan and Caleb mentioned too. I think my only main worry is that setting a per-test timeout with T.SetTimeout would be hard. Some machines are one or even two orders of magnitude faster than others in terms of CPU, disk, or network, especially when you take into account busy machines, like public shared CI runners.

This is not a problem for the -timeout or -testtimeout flags; they can be changed depending on the context. I can use -testtimeout=1m on my laptop, and -testtimeout=3m on the public CI, for example. However, T.SetTimeout is fairly static and can't be adjusted depending on the environment, as far as I can see.

Maybe we could also think about "testing timeout scales"; so I could run my tests in a way that all per-test timeouts are tripled if I know my machine is significantly slower than an average laptop (e.g. public CI). I think Go's builders do something similar, where very slow computers such as small ARM/RISCV boards get many timeouts multiplied to avoid benign timeouts.

rogpeppe

rogpeppe commented on Sep 3, 2021

@rogpeppe
Contributor

When a test timeout happens, the whole process still gets killed. There's nothing we can really do about that. But the failure does say which test function timed out.

Would the failure include all messages logged up until the timeout?

rsc

rsc commented on Sep 3, 2021

@rsc
ContributorAuthor

I understand the reasoning behind this, but I'm also slightly uneasy about it. Many CIs currently use lines like go test -timeout=30m ./...; that means they would get zero net benefit from this proposal,

Zero net benefit, but also zero breakage. It's hard to have the latter without the former.
If those same CIs have individual test cases that run for 10 minutes, I'm trying to avoid breaking them.
Or do you think it's OK to break them? (Honest question.)

If we were starting from scratch it would be much easier to say the two are separate.

rsc

rsc commented on Sep 3, 2021

@rsc
ContributorAuthor

@rogpeppe:

Would the failure include all messages logged up until the timeout?

Yes, I would hope so. I think that's possible independent of this proposal though. Right now we just do a panic but I don't see why we couldn't go around flushing output first. (Maybe it's tricky, I don't know.)

mvdan

mvdan commented on Sep 3, 2021

@mvdan
Member

Zero net benefit, but also zero breakage. It's hard to have the latter without the former.

That's fair - I don't have a better alternative in mind. I was mainly highlighting this because we should make it clear in the release notes: that any go test -timeout users will not see this benefit unless they tweak their flags.

rsc

rsc commented on Sep 3, 2021

@rsc
ContributorAuthor

Could you expand on the reason why we can't do better? [than crashing the whole process]

We have no way to forcefully stop a goroutine asynchronously. The reason we have no way is that it is unsafe: the goroutine may be holding locks, etc. There also may be other goroutines that are running as part of the test. There is no safe way to attribute which goroutines need stopping (and then also no way to safe way to stop them).

josharian

josharian commented on Sep 5, 2021

@josharian
Contributor

There is no safe way to attribute which goroutines need stopping (and then also no way to safe way to stop them).

If we had a testing.TB Context method then the tests and benchmarks could do cooperative graceful shutdown. (Examples are a bit trickier without further changes, but are also less likely to be problematic.)

rsc

rsc commented on Sep 8, 2021

@rsc
ContributorAuthor

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

84 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Accepted

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @rogpeppe@josharian@rsc@bitfield@cespare

        Issue actions

          cmd/go: add per-test timeouts · Issue #48157 · golang/go