-
Notifications
You must be signed in to change notification settings - Fork 650
Use patience
diff algorithm
#891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR aims to improve performance in test diffing by using an alternate histogram diff algorithm for larger diffs. Changes include:
- Replacing diff.Text with a custom diffText function that selects the diff algorithm based on the size of the test output.
- Introducing new helper types and functions (inputPair, intoEditScript, and addRange) to facilitate the new diff algorithm.
- Adding a new dependency on github.com/octavore/delta in go.mod.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
internal/testutil/baseline/baseline.go | Introduces the new diffText function and associated helpers for histogram diffing. |
go.mod | Adds dependency on github.com/octavore/delta v0.6.1. |
Comments suppressed due to low confidence (1)
internal/testutil/baseline/baseline.go:241
- The function diffText references the 'strings' package without importing it; add 'import "strings"' to resolve the missing dependency.
if math.Abs(float64(strings.Count(expected, "\n")-strings.Count(actual, "\n"))) < 10000 {
I tried this out by disabling the Which I can rationalize as "starting at -1 in the original because the first line is missing", but maybe the old algorithm isn't great either. I'm not attached to the |
In the existing |
FWIW this seems fine to me with the nits I've mentioned fixed, though I wish we could use a more popular dependency to do the same thing. I've been meaning to send a PR to |
Actually, maybe https://pkg.go.dev/github.com/peter-evans/patience would work even better? |
Or, technically we also pull in |
The |
I'm biased towards whichever one involves less code on our part; that's the only real concern I have, just the code that was added here in order to use the new lib compared to the old one. And if we do switch, I don't mind biting the bullet and just using it for everything. There's not many PRs open that couldn't just re-accept baselines. |
patience
diff algorithm
I've measured this before/after, and this PR halves the time the tests spend diffing, more than halves the memory allocated in bytes, and halves the number of allocs. That and presumably, makes the other PR not time out! |
It's impossible to comment on the PR, but we don't need the |
I said So: diff --git a/internal/testutil/baseline/baseline.go b/internal/testutil/baseline/baseline.go
index 3767a2b5b..1f64731ae 100644
--- a/internal/testutil/baseline/baseline.go
+++ b/internal/testutil/baseline/baseline.go
@@ -2,7 +2,6 @@ package baseline
import (
"fmt"
- "io"
"os"
"path/filepath"
"regexp"
@@ -109,17 +108,14 @@ type inputPair struct {
b []string
}
-func diffText(oldName string, newName string, expected string, actual string, w io.Writer) error {
+func diffText(oldName string, newName string, expected string, actual string) string {
lines := patience.Diff(stringutil.SplitLines(expected), stringutil.SplitLines(actual))
- result := patience.UnifiedDiffTextWithOptions(lines, patience.UnifiedDiffOptions{
+ return patience.UnifiedDiffTextWithOptions(lines, patience.UnifiedDiffOptions{
Precontext: 3,
Postcontext: 3,
SrcHeader: oldName,
DstHeader: newName,
})
-
- _, err := w.Write([]byte(result))
- return err
}
func getBaselineDiff(t *testing.T, actual string, expected string, fileName string, fixupOld func(string) string) string {
@@ -129,14 +125,11 @@ func getBaselineDiff(t *testing.T, actual string, expected string, fileName stri
if actual == expected {
return NoContent
}
- var b strings.Builder
- if err := diffText("old."+fileName, "new."+fileName, expected, actual, &b); err != nil {
- return fmt.Sprintf("failed to diff the actual and expected content: %v\n", err)
- }
+
+ s := diffText("old."+fileName, "new."+fileName, expected, actual)
// Remove line numbers from unified diff headers; this avoids adding/deleting
// lines in our baselines from causing knock-on header changes later in the diff.
- s := b.String()
aCurLine := 1
bCurLine := 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR switches the test baseline diffing to use the “patience” algorithm for improved performance and whitespace-insensitive diffs.
- Introduce a new
diffText
helper usingpeter-evans/patience
inbaseline.go
- Remove
github.com/pkg/diff
, add6github.com/peter-evans/patience
ingo.mod
- Update all affected baseline
.diff
files undertestdata/baselines/...
to match the new diff output
Reviewed Changes
Copilot reviewed 20791 out of 20791 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
internal/testutil/baseline/baseline.go | Replaced diff.Text usage with a new diffText helper leveraging patience |
go.mod | Removed github.com/pkg/diff , added github.com/peter-evans/patience |
testdata/baselines/reference/submodule/compiler/*.errors.txt.diff | Updated expected error diff outputs to align with the patience algorithm |
testdata/baselines/reference/submodule/compiler/2dArrays.symbols.diff | Updated symbol diff ordering/output from patience diff |
Comments suppressed due to low confidence (2)
internal/testutil/baseline/baseline.go:106
- [nitpick] Consider renaming
diffText
to something more descriptive likegenerateUnifiedDiff
orcomputeBaselineDiff
to clarify its purpose.
func diffText(oldName string, newName string, expected string, actual string) string {
internal/testutil/baseline/baseline.go:14
- If
assert
is no longer used in this file after the diff changes, remove the import to clean up unused dependencies.
"gotest.tools/v3/assert"
@@ -102,21 +103,27 @@ func readFileOrNoContent(fileName string) string { | |||
return string(content) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Add a Go doc comment for diffText
, briefly explaining the parameters and the returned unified diff string to improve maintainability.
// diffText generates a unified diff string between the expected and actual text. | |
// Parameters: | |
// - oldName: The name of the source file or content being compared. | |
// - newName: The name of the destination file or content being compared. | |
// - expected: The expected content as a string. | |
// - actual: The actual content as a string. | |
// Returns: | |
// A unified diff string highlighting the differences between the expected and actual content. |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why would anyone add docs like this ever again if they can just ask copilot to guess at the docs like this? 🤔
...it's also just self-evident, in this case.
This solves the perf issue with test diffing seen in #791
The
patience
algorithm is suitable for use for all test diffs, but it makes some subtly different choices in edit order in some cases, and is whitespace-insensitive, so will result in some whitespace changes in the diffs.