evaluator: general performance enhancements #804

cueckoo · 2021-07-03T10:34:14Z

Originally opened by @mpvl in cuelang/cue#804

This Issue is an umbrella issue for all performance-related issues. Below is a small selection of the type of performance improvements that are being considered.

Background

Graph unification at its core approaches O(n) (in the same way a hash table lookup approaches O(1)). CUE, however, introduces a few concepts that make it more expensive: comprehensions and disjunctions. There is little one can do for comprehensions (especially list comprehensions), but at least they visually stand out and clearly signal complexity visually.

Another one is disjunctions. Naive evaluation of disjunctions can lead to exponential evaluation time. We observed, however, that for most applications, including Protobuf oneofs and disjunctions of Kubernetes types, for instance, the evaluation is still linear. We will discuss possible optimizations below.

Note that CUE also allows references, which makes it, like YAML and XML, susceptible to the “billion laughs” attack. However, although this is not implemented yet in CUE, graph unification allows for an algorithmic trick called structure sharing which would allow to evaluate such configurations in O(n) time again. The current algorithm of CUE is compatible with structure sharing and thus implementing this should not be a stretch.

Disjunctions

Disjunction performance improvements have been extracted to #2002

General evaluation

Keep sorted lists of arcs

Currently arcs are maintained as lists and matching on labels is done by looping over them. This can be slow for structs with large sets of arcs. Merging sorted lists, on the other hand, allows for O(n) operation and is clearly more scalable and necessary to approximate a O(n) time complexity (note that a merge sort on sorted input is O(n)).

Reuse of computation

Right now, updating specific values in a configuration, for instance with Fill, triggers a reevaluation of the entire tree. The plan is to make incremental updates possible. One possibility is to keep track of dependencies, mark nodes a “dirty” upon changes, and recompute the tree, while reusing the values that remained unmarked.
Research in the field of spreadsheet computation and propagator networks can be helpful in this regard.

Delayed error computation

Generating an error can be quite expensive in CUE, as it compiles a lot of information in the message. This is especially an issue in the case of disjunctions, where errors are used for detecting elimination. The v0.3 data structures allow for a different error model where an error would just consist of a few pointers to the point of origin, delaying this computation until the error is actually printed or analysed by the user.

Aside from improving performance, this delayed error computation also allows for better error reporting, as it allows for more expensive computations, such as more advanced error filtering and showing more detailed context.

Warren Abstract Machine

This is an optimization that only makes sense once the above optimizations have been addressed.

A Warren Abstract Machine is a concept originating from Prolog to make the evaluation of Prolog more efficient. In the context of CUE it could especially be useful for reducing memory allocations. Right now CUE needs considerable amounts of allocation. v0.3 already greatly reduces this by keeping reusable per-node buffers. But there is still considerable room for further reducing allocations.

Other improvements

special-case list processing
reduce size of adt.Vertex
optimize boolean computation (now always creates an adt.Bool, but is often not needed)
special-case arithmetic of small numbers
less memory allocation

The text was updated successfully, but these errors were encountered:

nyarly · 2021-11-10T17:48:38Z

I'm really concerned by "There is little one can do for comprehensions (especially list comprehensions), but at least they visually stand out and clearly signal complexity visually."

Our use case for CUE is to simplify and abstract a "large" Kubernetes configuration. (Large here means ~100kloc).

We're using Kustomize as an intermediary, and building out CUE defs that we can unify in order to select features for different clusters. One of the abstraction targets has been to automate away the coupling of file names as written to disk and as referenced in kustomize.yaml files. Consequently, the contents of our kustomize.yaml is build up out of list comprehensions.

Performance has been an ongoing challenge for us, with builds in the 5-15 minute range. If they get much higher than 15 minutes, they quickly spiral into "build for an hour before the OOMKiller notices." Each release of CUE has been a significant improvement for us, but list comprehensions being a permanent limiting factor definitely gives me pause.

mpvl · 2021-11-10T18:31:30Z

@nyarly:

"There is little one can do for comprehensions (especially list comprehensions), but at least they visually stand out and clearly signal complexity visually."

Mind you, this is not specific to CUE, but for any language that has comprehensions: it is easy to get into trouble: take a smallish list (say a 1000 elements) and then iterate over it in 5 nested for loops and you'll be hard-pressed to find a language where this will be spiffy.

There are some known performance issues that make the big-O performance considerably larger than it needs to be. Fixes for these are in the works. It will take some time, still, but they are high on the priority list.

nyarly · 2021-11-10T19:04:04Z

Okay - nesting 1000 element loops 5 deep is an unsurprising complexity factor.

What I'm seeing is that ~10 unified defs with something like output: { one: [ for k,v in input { ... }], two: [ for k,v in input { ... } } unified explodes with the number of defs involved.

Notionally, it seems like that should be hundreds of elements in 5 sequential loops should still be in the O(n) range, and it sounds like you're implying that's a reasonable expectation.

mpvl · 2021-11-10T19:24:43Z

@nyarly If you have a drilled down reproduces with such unexpected behavior, I would love to have it.

Coincidentally, I'm working on redoing some of the comprehension code that simplifies the code significantly and will allow for more improvements down the line.

nyarly · 2021-11-10T19:50:49Z

I haven't been able to produce a reduced case for this issue in particular. I might take another stab at something with the idea that it was unification and comprehension to trigger the issue.

I've been talking with @myitcv for a little while about hooking up with Unity, and be thrilled when I can show you what we've been doing with CUE so far.

An incomplete error indicates the error may be resolved by making a configuration more specific. However, sometimes, during partial evaluation, an incomplete error may arise because a value is not fully evaluated yet to the point the incomplete error may still resolve upon further evaluation. In this change we make some distinctions between these cases to prevent non-applicable error messages from bubbling up and rather propagate applicable error messages more strictly. This may also have a positive impact on performance, as less incomplete evaluations need to be "retried". Fixes #1837 Issue #804 Signed-off-by: Marcel van Lohuizen <[email protected]> Change-Id: I2d1a7876d626da5b4a32db8e1cd6cd9b87eb63e9 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/543574 Reviewed-by: Aram Hăvărneanu <[email protected]> Unity-Result: CUEcueckoo <[email protected]> TryBot-Result: CUEcueckoo <[email protected]>

mvdan · 2024-10-09T11:19:06Z

Since February 2024, we have started tracking performance issues and improvements in a more structured and hopefully useful way; we have a new top-level tracking issue at #2850, and a number of sub-issues such as #2854 to replace the "Reuse of computation" section above.

Given that we are actively updating issue 2850 with regular updates, and that the work outlined in that issue and its sub-issues reflect the current performance challenges of the evaluator, I am going to close this issue in favor of those for the sake of clarity. We hadn't used nor updated this issue in nearly three years, and all relevant issues and planned work should already be reflected in the new issues.

Please let me know if I missed anything :)

cueckoo added the roadmap High level umbrella issue laying out important roadmap goals label Jul 3, 2021

This was referenced Jul 3, 2021

FillPath Performance Declines Linearly #899

Open

Roadmap: Performance enhancements cuelang/cue#804

Closed

sdboyer mentioned this issue Jul 16, 2021

Add benchmarks for scuemata loading grafana/grafana#36869

Closed

verdverm mentioned this issue Jul 22, 2021

Cancellable evaluation in the Go API #1104

Open

myitcv mentioned this issue Jul 28, 2021

internal/eval: another performance issue with recursive definitions #1044

Open

grouville mentioned this issue Jul 29, 2021

Track all cue performance related PRs [No structure sharing yet] dagger/dagger#856

Closed

myitcv mentioned this issue May 19, 2022

Warning about DoS attacks at parsing CUE #158

Open

myitcv changed the title ~~Roadmap: Performance enhancements~~ evaluator: general performance enhancements Oct 14, 2022

myitcv mentioned this issue Oct 14, 2022

evaluator: disjunction performance improvements #2002

Open

myitcv added roadmap/performance pre v1.0 and removed roadmap High level umbrella issue laying out important roadmap goals pre v1.0 labels Oct 14, 2022

myitcv assigned mpvl Oct 14, 2022

myitcv added the zGarden label Jun 15, 2023

mvdan removed the zGarden label Feb 8, 2024

mvdan closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluator: general performance enhancements #804

evaluator: general performance enhancements #804

cueckoo commented Jul 3, 2021 •

edited by myitcv

Loading

nyarly commented Nov 10, 2021

mpvl commented Nov 10, 2021

nyarly commented Nov 10, 2021

mpvl commented Nov 10, 2021

nyarly commented Nov 10, 2021

mvdan commented Oct 9, 2024

evaluator: general performance enhancements #804

evaluator: general performance enhancements #804

Comments

cueckoo commented Jul 3, 2021 • edited by myitcv Loading

Background

Disjunctions

General evaluation

Keep sorted lists of arcs

Reuse of computation

Delayed error computation

Warren Abstract Machine

Other improvements

nyarly commented Nov 10, 2021

mpvl commented Nov 10, 2021

nyarly commented Nov 10, 2021

mpvl commented Nov 10, 2021

nyarly commented Nov 10, 2021

mvdan commented Oct 9, 2024

cueckoo commented Jul 3, 2021 •

edited by myitcv

Loading