-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(BA-932): Unload purged docker images #3923
base: main
Are you sure you want to change the base?
Changes from all commits
7f18f31
fec47cc
702389e
8d540f1
c0456dc
9da60da
2ca7171
d6dde75
595644b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Unload images that have been purged from the agents in the manager. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -237,6 +237,7 @@ class AgentRegistry: | |
pending_waits: set[asyncio.Task[None]] | ||
database_ptask_group: aiotools.PersistentTaskGroup | ||
webhook_ptask_group: aiotools.PersistentTaskGroup | ||
agent_installed_images: dict[AgentId, set[str]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we are caching image info to redis, should we cache locally again? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this is necessary to detect images that need to be removed. |
||
|
||
def __init__( | ||
self, | ||
|
@@ -287,6 +288,7 @@ def __init__( | |
hook_plugin_ctx, | ||
self, | ||
) | ||
self.agent_installed_images = {} | ||
|
||
async def init(self) -> None: | ||
self.heartbeat_lock = asyncio.Lock() | ||
|
@@ -3149,19 +3151,21 @@ async def _update() -> None: | |
) | ||
|
||
# Update the mapping of kernel images to agents. | ||
loaded_images = msgpack.unpackb(zlib.decompress(agent_info["images"])) | ||
images = msgpack.unpackb(zlib.decompress(agent_info["images"])) | ||
image_canonicals = set(img_info[0] for img_info in images) | ||
Comment on lines
+3154
to
+3155
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume the 0th value means canonical, but that's not going to change, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, FYI, |
||
prev_image_canonicals = self.agent_installed_images.get(agent_id, set()) | ||
|
||
async def _pipe_builder(r: Redis): | ||
pipe = r.pipeline() | ||
for image, _ in loaded_images: | ||
try: | ||
await pipe.sadd(ImageRef.parse_image_str(image, "*").canonical, agent_id) | ||
except ValueError: | ||
# Skip opaque (non-Backend.AI) image. | ||
continue | ||
Comment on lines
-3156
to
-3161
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I intentionally omitted the logic to skip image caching when a ValueError occurs while parsing the image string since I think excluding the loaded images sent from the agent from only redis caching is meaningless. |
||
for image in image_canonicals: | ||
await pipe.sadd(image, agent_id) | ||
|
||
for image in prev_image_canonicals - image_canonicals: | ||
await pipe.srem(image, agent_id) | ||
return pipe | ||
|
||
await redis_helper.execute(self.redis_image, _pipe_builder) | ||
self.agent_installed_images[agent_id] = image_canonicals | ||
|
||
await self.hook_plugin_ctx.notify( | ||
"POST_AGENT_HEARTBEAT", | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer is created in the line below, but this variable is not used.
backend.ai/src/ai/backend/agent/agent.py
Line 755 in 1f77864