Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(BA-932): Unload purged docker images #3923

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jopemachine
Copy link
Member

@jopemachine jopemachine commented Mar 8, 2025

Resolves #3924 (BA-927, BA-932)

Checklist: (if applicable)

  • Mention to the original issue

@github-actions github-actions bot added size:XS ~10 LoC comp:manager Related to Manager component labels Mar 8, 2025
@jopemachine jopemachine changed the title fix: Clear redis image cache when purging docker images fix(BA-932): Clear redis image cache when purging docker images Mar 8, 2025
@jopemachine jopemachine added this to the 25Q1 milestone Mar 8, 2025
@github-actions github-actions bot added size:S 10~30 LoC and removed size:XS ~10 LoC labels Mar 8, 2025
@jopemachine jopemachine force-pushed the fix/clear-redis-cache-when-purge-images branch from 5cbd26c to 8d540f1 Compare March 10, 2025 02:03
@jopemachine jopemachine marked this pull request as ready for review March 10, 2025 02:08

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR addresses BA-932 by ensuring that the Redis cache for docker images is cleared when images are purged.

  • Registers each agent's installed images using a new mapping.
  • Updates the Redis pipeline logic to add new images and remove old ones based on canonical image identifiers.

Reviewed Changes

File Description
src/ai/backend/manager/registry.py Introduces a new mapping (agent_installed_images) and updates Redis pipeline logic to manage image caches.
changes/3923.fix.md Documents the changes made to remove stale Redis image cache entries.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

src/ai/backend/manager/registry.py:240

  • [nitpick] Consider using typing.Dict and typing.Set (if supported by your project style) for consistency with the rest of the codebase's type hints.
agent_installed_images: dict[AgentId, set[str]]

src/ai/backend/manager/registry.py:3160

  • Verify that loaded_image_canonicals contains only valid canonical image strings. The previous implementation caught ValueError from parsing invalid image strings; ensure that the new set comprehension guarantees valid values.
for image in loaded_image_canonicals:
@jopemachine jopemachine changed the title fix(BA-932): Clear redis image cache when purging docker images fix(BA-932): Unload purged docker images Mar 10, 2025
Comment on lines -3156 to -3161
for image, _ in loaded_images:
try:
await pipe.sadd(ImageRef.parse_image_str(image, "*").canonical, agent_id)
except ValueError:
# Skip opaque (non-Backend.AI) image.
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally omitted the logic to skip image caching when a ValueError occurs while parsing the image string since I think excluding the loaded images sent from the agent from only redis caching is meaningless.

@jopemachine
Copy link
Member Author

jopemachine commented Mar 10, 2025

I'm concerned that WebUI will display purged images as "Installation Complete" until the next scan_image is called.
However, if purge_images directly deletes the Redis cache, there could be a timing issue where the deleted image cache gets restored, so I will not remove the Redis cache in purge_images.

@github-actions github-actions bot added the comp:agent Related to Agent component label Mar 10, 2025
@@ -1240,7 +1240,6 @@ class DockerAgent(AbstractAgent[DockerKernel, DockerKernelCreationContext]):
monitor_docker_task: asyncio.Task
agent_sockpath: Path
agent_sock_task: asyncio.Task
scan_images_timer: asyncio.Task
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer is created in the line below, but this variable is not used.

self.timer_tasks.append(aiotools.create_timer(self._scan_images_wrapper, 20.0))

Comment on lines +3154 to +3155
images = msgpack.unpackb(zlib.decompress(agent_info["images"]))
image_canonicals = set(img_info[0] for img_info in images)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the 0th value means canonical, but that's not going to change, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, FYI,
Tuple's 0th value: image canonical.
Tuple's 1st value: image digest.

@@ -237,6 +237,7 @@ class AgentRegistry:
pending_waits: set[asyncio.Task[None]]
database_ptask_group: aiotools.PersistentTaskGroup
webhook_ptask_group: aiotools.PersistentTaskGroup
agent_installed_images: dict[AgentId, set[str]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are caching image info to redis, should we cache locally again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is necessary to detect images that need to be removed.

@jopemachine jopemachine force-pushed the fix/clear-redis-cache-when-purge-images branch from 4aed8b4 to 595644b Compare March 10, 2025 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:agent Related to Agent component comp:manager Related to Manager component size:S 10~30 LoC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unload purged docker images
2 participants