[SwiftLexicalLookup] Unqualified lookup caching #3068

MAJKFL · 2025-04-30T18:58:45Z

This PR introduces optional caching support to SwiftLexicalLookup. In order to use it, clients can pass an instance of LookupCache as a parameter to the lookup function.

LookupCache keeps track of cache member hits. In order to prevent the cache from taking too much memory, clients can call the LookupCache.evictEntriesWithoutHit function to remove members without a hit and reset the hit property for the remaining members. Calling this function every time after lookup effectively maintains one path from a leaf to the root of the scope tree in cache.

Clients can also optionally set the drop value:

/// Creates a new unqualified lookup cache.
/// `drop` parameter specifies how many eviction calls will be
/// ignored before evicting not-hit members of the cache.
///
/// Example cache eviction sequences (s - skip, e - evict):
/// - `drop = 0` - `e -> e -> e -> e -> e -> ...`
/// - `drop = 1` - `s -> e -> s -> s -> e -> ...`
/// - `drop = 3` - `s -> s -> s -> e -> s -> ...`
///
/// - Note: `drop = 0` effectively maintains exactly one path of cached results to
/// the root in the cache (assuming we evict cache members after each lookup in a sequence of lookups).
/// Higher the `drop` value, more such paths can potentially be stored in the cache at any given moment.
/// Because of that, a higher `drop` value also translates to a higher number of cache-hits,
/// but it might not directly translate to better performance. Because of a larger memory footprint,
/// memory accesses could take longer, slowing down the eviction process. That's why the `drop` value
/// could be fine-tuned to maximize the performance given file size,
/// number of lookups, and amount of available memory.
public init(drop: Int = 0) {
  self.dropMod = drop + 1
}

…p-caching

MAJKFL · 2025-04-30T19:09:19Z

swiftlang/swift#81209

@swift-ci Please test

ahoppen

Without diving too deeply into the details: I am a little concerned about the cache eviction behavior and the fact that you need to manually call evictEntriesWithoutHit (which incidentally doesn’t seem to be called in this PR or swiftlang/swift#81209) and I think it’s easy for clients to forget to call it. Does this more complex cache eviction policy provide significant benefits over a simple LRU cache that keeps, say 100, cache entries? We could share the LRUCache type that we currently have in SwiftCompilerPluginMessageHandling for that. Curious to hear your opinion.

ahoppen · 2025-05-05T09:56:16Z

Sources/SwiftLexicalLookup/LookupCache.swift

+  /// memory accesses could take longer, slowing down the eviction process. That's why the `drop` value
+  /// could be fine-tuned to maximize the performance given file size,
+  /// number of lookups, and amount of available memory.
+  public init(drop: Int = 0) {


I’m not a fan of the drop naming here. I don’t have a better suggestion yet, maybe I’ll come up with one.

Yes, I agree it is a bit ambiguous. What about skip?

Sources/SwiftLexicalLookup/Scopes/ScopeSyntax.swift

MAJKFL · 2025-05-26T08:28:30Z

Without diving too deeply into the details: I am a little concerned about the cache eviction behavior and the fact that you need to manually call evictEntriesWithoutHit (which incidentally doesn’t seem to be called in this PR or swiftlang/swift#81209) and I think it’s easy for clients to forget to call it.

Hi Alex, thank you for the suggestions and sorry for the late reply. I got quite busy with school. Thank you for pointing out evictEntriesWithoutHit is not called in the other PR. Originally, I called the method inside SyntaxProtocol.lookup after performing lookup, but ended up passing eviction to the client for extra flexibility. I must’ve forgotten to put it there. I think there’s enough evidence that it was a bad idea :).

Does this more complex cache eviction policy provide significant benefits over a simple LRU cache that keeps, say 100, cache entries? We could share the LRUCache type that we currently have in SwiftCompilerPluginMessageHandling for that. Curious to hear your opinion.

The current implementation assumes subsequent lookups happen in close proximity to the previous lookup. Like e.g. in the compiler in a single top-bottom scan (best case). The algorithm follows the intuition that for any (close) subsequent lookup, we shouldn’t recompute more than one scope. In top-bottom scan and maintaining one path to the root, we always have a guaranteed cache hit in the first common ancestor. I think a sufficiently big LRU cache would have a similar behavior, but it would require more memory than this approach and not provide additional speedup. I’ve also noticed that growing the cache too big leads to diminishing returns. I suppose it could happen because less of the data structure can remain cached in memory.

I attach below a sketch I used when pitching the idea to @DougGregor that visualizes an optimal top-bottom scan. In each step, blue represents contents of the cache, red represents evicted entries and green arrows point at the lookup position.

I think SwiftLexicalLookup could still benefit from an LRU cache though. The current implementation lacks an ability to arbitrarily lookup previously evaluated names without reevaluating a great part of the syntax tree below. What if we still used the optimal and small cache from the current implementation for subsequent lookups and maintain a large LRU cache for symbols/leaves that would fill up alongside it? This way we would have the best out of two worlds without blowing up the size of LRU with intermediate scope results. What do you think about this idea?

ahoppen · 2025-05-26T12:23:28Z

Would it be possible to use an LRU cache and provide an eviction method that can be called to clean up the cache as we know that some parts of it are no longer relevant (what you described in the sketch above). That way we would get reasonable out-of-the-box behavior and don’t have an ever-growing cache but also have the ability to keep the cache size low in cases where the client (here the compiler) cares about it and knows the access patterns.

MAJKFL · 2025-05-27T21:24:51Z

That way we would get reasonable out-of-the-box behavior and don’t have an ever-growing cache but also have the ability to keep the cache size low in cases where the client (here the compiler) cares about it and knows the access patterns.

Ah yes, that’s a very good idea to have an upper bound for the size of the cache. I haven’t thought about it. I’ll try to look into how to extend LRUCache from SwiftCompilerPluginMessageHandling with the cleanup algorithm then. Should we hoist LRUCache to some other, shared place, or should it remain in SwiftCompilerPluginMessageHandling?

ahoppen · 2025-05-28T08:26:41Z

Should we hoist LRUCache to some other, shared place, or should it remain in SwiftCompilerPluginMessageHandling?

We should hoist it up. We could put it into a new module or just stick it in the SwiftSyntax target at the package access level – I haven’t quite decided on that yet but I think it’s something that we could also change easily once the rest of the PR has taken shape.

…ft-syntax module with package level access.

MAJKFL · 2025-06-17T16:23:28Z

swiftlang/swift#81209

@swift-ci Please test

MAJKFL · 2025-06-18T15:01:53Z

swiftlang/swift#81209

@swift-ci Please test Windows Platform

ahoppen

Thanks for addressing my review comments. I just had a chance to look at the PR again and left a few comments inline.

ahoppen · 2025-06-20T09:34:33Z

Sources/SwiftLexicalLookup/LookupCache.swift

+  /// memory accesses could take longer, slowing down the eviction process. That's why the `drop` value
+  /// could be fine-tuned to maximize the performance given file size,
+  /// number of lookups, and amount of available memory.
+  public init(capacity: Int, drop: Int = 0) {


Just an idea: Would it make sense to move the drop parameter to evictEntriesWithoutHit. That way clients don’t have to think about the dropping cache eviction policy unless they start calling evictEntriesWithoutHit. It would also open up the option to vary the size of the cache dynamically depending on the shape of the code that we’re in (not sure if that’s useful or not). It would also remove the need for bypassDropCounter in that function because you could pass drop: 0 there, I think.

ahoppen · 2025-06-20T09:37:29Z

Sources/SwiftLexicalLookup/LookupCache.swift

+  /// `nil` if there's no cache entry for the given `id`.
+  /// Adds `id` and ids of all ancestors to the cache `hits`.
+  func getCachedAncestorResults(id: SyntaxIdentifier) -> [LookupResult]? {
+    guard let results = ancestorResultsCache[id] else { return nil }


If the user doesn’t call evictEntriesWithoutHit, hits will keep on growing indefinitely. Should we clear up hits periodically for elements that are no longer in the cache? Eg. as a kind of garbage collection if hits.count > capacity * 2. Or should we only keep track of hits if the user opts into it inside the initializer?

ahoppen · 2025-06-20T09:44:12Z

Sources/SwiftLexicalLookup/LookupCache.swift

+    for key in ancestorResultsCache.keysInCache.union(sequentialResultsCache.keysInCache).subtracting(hits) {
+      ancestorResultsCache[key] = nil
+      sequentialResultsCache[key] = nil
+    }


Not sure how performance sensitive this is or how expensive the Set operations are but I would imagine it would be more performant if we just did the following because it requires no set modifications.

for key in ancestorResultsCache where !hits.contains(key) { ancestorResultsCache[key] = nil } for key in sequentialResultsCache where !hits.contains(key) { sequentialResultsCache[key] = nil }

ahoppen · 2025-06-20T09:46:11Z

Sources/SwiftLexicalLookup/Scopes/ScopeImplementations.swift

@@ -698,7 +735,8 @@ import SwiftSyntax
  public func lookup(
    _ identifier: Identifier?,
    at lookUpPosition: AbsolutePosition,
-    with config: LookupConfig
+    with config: LookupConfig,
+    cache: LookupCache?


Should we default cache to nil to avoid API breakage?

ahoppen · 2025-06-20T09:47:34Z

Sources/SwiftLexicalLookup/Scopes/ScopeImplementations.swift

      )
    } else {
-      return lookupInParent(identifier, at: lookUpPosition, with: config)
+      // We're not using `lookupParent` to not cache the results here.


Why don’t we want to cache the results here? Same for the other places that have

We're not using `lookupParent` to not cache the results here.

I think adding the why as a comment would be helpful for future reference.

ahoppen · 2025-06-20T09:59:22Z

Sources/SwiftSyntax/LRUCache.swift

@@ -33,12 +32,14 @@ public class LRUCache<Key: Hashable, Value> {
  private unowned var tail: _Node?

  public let capacity: Int
+  public private(set) var keysInCache: Set<Key>


Should all the public members be marked as package if LRUCache is package now? Also, thanks for introducing the ability to use package in swift-syntax 🙏🏽

ahoppen · 2025-06-20T10:00:46Z

Sources/SwiftSyntax/LRUCache.swift

@rintaro Could you check if you have any concerns for the changes in LRUCache?

ahoppen · 2025-06-20T10:01:24Z

Tests/SwiftCompilerPluginTest/LRUCacheTests.swift

@@ -11,6 +11,7 @@
 //===----------------------------------------------------------------------===//

 @_spi(Testing) import SwiftCompilerPluginMessageHandling
+import SwiftSyntax


If LRUCache is in SwiftSyntax`, I think we should also move the tests to SwiftSyntax.

ahoppen · 2025-06-20T10:05:41Z

Sources/SwiftLexicalLookup/Scopes/ScopeSyntax.swift

  ) -> [LookupResult] {
-    scope?.lookup(identifier, at: self.position, with: config) ?? []
+    if let cache, let identifier {


Just an idea: Could we avoid one nesting level by making this

guard let cache, let identifier else { return scope?.lookup(identifier, at: self.position, with: config, cache: cache) ?? [] }

ahoppen · 2025-06-20T10:09:38Z

Sources/SwiftLexicalLookup/Scopes/ScopeSyntax.swift

  ) -> [LookupResult] {
-    scope?.lookup(identifier, at: self.position, with: config) ?? []
+    if let cache, let identifier {


Does this mean that we don’t use the cache if you run lookup without an identifier. Shouldn’t we be able to return the results from the cache in that case without filtering?

MAJKFL added 5 commits February 21, 2025 11:50

Merge remote-tracking branch 'upstream/main' into rfc-upstream-sync

e547074

Add SwiftLexicalLookup result caching.

e7c02d8

Add SwiftLexicalLookup cache unit testing.

83e1ca3

Merge remote-tracking branch 'upstream/main' into swift-lexical-looku…

047d95b

…p-caching

Add missing cache parameters to lookup calls.

07aa269

MAJKFL requested review from ahoppen and bnbarham as code owners April 30, 2025 18:58

MAJKFL mentioned this pull request Apr 30, 2025

[SwiftLexicalLookup] Add support for caching in result validation swiftlang/swift#81209

Open

ahoppen reviewed May 5, 2025

View reviewed changes

Implement LookupCache with LRUCache. Move LRUCache.swift to swi…

2e8b39f

…ft-syntax module with package level access.

MAJKFL requested review from hamishknight and rintaro as code owners June 17, 2025 16:04

Fix order of imports in LRUCacheTests.swift.

703d83a

ahoppen reviewed Jun 20, 2025

View reviewed changes

[SwiftLexicalLookup] Unqualified lookup caching #3068

Are you sure you want to change the base?

[SwiftLexicalLookup] Unqualified lookup caching #3068

Uh oh!

Conversation

MAJKFL commented Apr 30, 2025

Uh oh!

MAJKFL commented Apr 30, 2025

Uh oh!

ahoppen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MAJKFL commented May 26, 2025

Uh oh!

ahoppen commented May 26, 2025

Uh oh!

MAJKFL commented May 27, 2025

Uh oh!

ahoppen commented May 28, 2025

Uh oh!

MAJKFL commented Jun 17, 2025

Uh oh!

MAJKFL commented Jun 18, 2025

Uh oh!

ahoppen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!