-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(BA-932): Unload purged docker images #3923
base: main
Are you sure you want to change the base?
Conversation
5cbd26c
to
8d540f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR addresses BA-932 by ensuring that the Redis cache for docker images is cleared when images are purged.
- Registers each agent's installed images using a new mapping.
- Updates the Redis pipeline logic to add new images and remove old ones based on canonical image identifiers.
Reviewed Changes
File | Description |
---|---|
src/ai/backend/manager/registry.py | Introduces a new mapping (agent_installed_images) and updates Redis pipeline logic to manage image caches. |
changes/3923.fix.md | Documents the changes made to remove stale Redis image cache entries. |
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
src/ai/backend/manager/registry.py:240
- [nitpick] Consider using typing.Dict and typing.Set (if supported by your project style) for consistency with the rest of the codebase's type hints.
agent_installed_images: dict[AgentId, set[str]]
src/ai/backend/manager/registry.py:3160
- Verify that loaded_image_canonicals contains only valid canonical image strings. The previous implementation caught ValueError from parsing invalid image strings; ensure that the new set comprehension guarantees valid values.
for image in loaded_image_canonicals:
for image, _ in loaded_images: | ||
try: | ||
await pipe.sadd(ImageRef.parse_image_str(image, "*").canonical, agent_id) | ||
except ValueError: | ||
# Skip opaque (non-Backend.AI) image. | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally omitted the logic to skip image caching when a ValueError occurs while parsing the image string since I think excluding the loaded images sent from the agent from only redis caching is meaningless.
I'm concerned that WebUI will display purged images as "Installation Complete" until the next |
@@ -1240,7 +1240,6 @@ class DockerAgent(AbstractAgent[DockerKernel, DockerKernelCreationContext]): | |||
monitor_docker_task: asyncio.Task | |||
agent_sockpath: Path | |||
agent_sock_task: asyncio.Task | |||
scan_images_timer: asyncio.Task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer is created in the line below, but this variable is not used.
backend.ai/src/ai/backend/agent/agent.py
Line 755 in 1f77864
self.timer_tasks.append(aiotools.create_timer(self._scan_images_wrapper, 20.0)) |
images = msgpack.unpackb(zlib.decompress(agent_info["images"])) | ||
image_canonicals = set(img_info[0] for img_info in images) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the 0th value means canonical, but that's not going to change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, FYI,
Tuple's 0th value: image canonical.
Tuple's 1st value: image digest.
@@ -237,6 +237,7 @@ class AgentRegistry: | |||
pending_waits: set[asyncio.Task[None]] | |||
database_ptask_group: aiotools.PersistentTaskGroup | |||
webhook_ptask_group: aiotools.PersistentTaskGroup | |||
agent_installed_images: dict[AgentId, set[str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are caching image info to redis, should we cache locally again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is necessary to detect images that need to be removed.
4aed8b4
to
595644b
Compare
Resolves #3924 (BA-927, BA-932)
Checklist: (if applicable)