ci jobs are not cleaning up docker images #747

tfoote · 2020-03-06T00:06:40Z

The CI jobs are creating docker images for each build, but not cleaning them up. This has filled up our 1TB harddrive with old images. The CI machines have over 1200 old images and are filling up the harddrive.

Here's an example snippet of the docker images listing:

1571326815.950986947.ci_build_and_test.eloquent latest 8b5f6b41d911 4 months ago 2.7GB 1571325032.960296107.ci_build_and_install.eloquent latest 67d0fbc7ef09 4 months ago 2.54GB 1571324947.152392622.ci_create_workspace.eloquent latest d83854882fcf 4 months ago 309MB 1571324930.221470130.ci_task_generation.eloquent latest acdb2e2624c2 4 months ago 281MB 1571318040.914411982.ci_build_and_test.eloquent latest ce61562661aa 4 months ago 2.72GB 1571313602.525396075.ci_build_and_install.eloquent latest 9d5959d22963 4 months ago 2.55GB 1571313526.771109636.ci_create_workspace.eloquent latest c20eb58bee6c 4 months ago 309MB 1571313514.388269068.ci_task_generation.eloquent latest be03977a5b76 4 months ago 275MB 1571311906.261694088.ci_build_and_install.eloquent latest 7f9c03d96eca 4 months ago 2.22GB 1571311731.099785552.ci_create_workspace.eloquent latest 2c9c1e18c178 4 months ago 309MB 1571311718.936275092.ci_task_generation.eloquent latest 7e82fa50ea0f 4 months ago 275MB 1571308435.419666820.ci_build_and_test.dashing latest fc729a3d10f8 4 months ago 2.65GB 1571306767.958518816.ci_build_and_install.dashing latest f2e623b496ef 4 months ago 2.49GB 1571306603.344548550.ci_create_workspace.dashing latest 2ac15995be4c 4 months ago 309MB 1571306522.255241445.ci_task_generation.dashing latest c86fb3225478 4 months ago 275MB 1571301189.875613828.ci_build_and_test.dashing latest 5711e9135404 4 months ago 2.67GB 1571299263.698240192.ci_build_and_install.dashing latest fbbd943caf77 4 months ago 2.52GB 1571299129.766302587.ci_create_workspace.dashing latest ad82f4fe6925 4 months ago 309MB 1571299116.744838555.ci_task_generation.dashing latest 70774e9d346f 4 months ago 279MB 1571295260.944698940.ci_build_and_test.dashing latest c4c0ed464aa0 4 months ago 2.65GB 1571293057.369408693.ci_build_and_install.dashing latest 6c2d5064a06b 4 months ago 2.49GB 1571292916.802026789.ci_create_workspace.dashing latest 0b0057018fa6 4 months ago 309MB 1571292904.160600017.ci_task_generation.dashing latest cf2b6b915016 4 months ago 279MB 1571245909.733100761.ci_build_and_install.eloquent latest 6f224facb867 4 months ago 1.55GB 1571245894.654350501.ci_create_workspace.eloquent latest c2a6f81ab854 4 months ago 309MB 1571245881.618156839.ci_task_generation.eloquent latest 20ea14dccfd7 4 months ago 278MB 1571239005.478707058.ci_build_and_test.eloquent latest e4337522f6d5 4 months ago 1.99GB

The remaining images are pruned when the disk fills up but this process takes a lot of burden and slows down many docker operations.

This comes from the unique naming of the docker images.

ros_buildfarm/ros_buildfarm/templates/ci/ci_job.xml.em

Line 160 in 117f59a

'export DOCKER_IMAGE_PREFIX=$(date +%s.%N)',

All other images are unique to the job and consequently are overwritten when rerun and we only accumulate one image per job. This is reasonable to persist as we might be able to reuse layers run to run, and doesn't accumulate items that will never be reused.

We're currently running the jobs with a single executor, but in the future if we were to run them in parallel we would risk a race condition on building the parallel executors. To that end I believe EXECUTOR_NUMBER will avoid collisions but keep the scope from growing.

https://wiki.jenkins.io/display/JENKINS/Building+a+software+project

The text was updated successfully, but these errors were encountered:

Fixes #747 Signed-off-by: Tully Foote <[email protected]>

cottsay · 2020-03-06T01:15:56Z

Some context on the DOCKER_IMAGE_PREFIX:

I believe that the motivation for the unique ID is specifically to ensure that the image that was built is the same image that is run, given there could be another job building on the same worker.

More specifically, we're guarding against a race between two finishing build invocations and their corresponding run invocations. As far as I know, multiple containers runs using the same image isn't a problem as long as they're correct - but because we're using the image label to differentiate them, the only way to ensure that what we built is what we run on is to make the label unique. Alternatively, if we could somehow pass the image ID from a completed docker build to the subsequent docker run, we'd be able to set the label in the same way we do with the other jobs.

tl;dr - your change to use the EXECUTOR_NUMBER sounds reasonable to guard against the race.

tfoote added the bug label Mar 6, 2020

tfoote added a commit that referenced this issue Mar 6, 2020

Reduce uniqueness of docker images to prevent continuous aggregation

d2797ce

Fixes #747 Signed-off-by: Tully Foote <[email protected]>

tfoote mentioned this issue Mar 6, 2020

Reduce uniqueness of docker images to prevent continuous aggregation #748

Merged

tfoote closed this as completed in #748 Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci jobs are not cleaning up docker images #747

ci jobs are not cleaning up docker images #747

tfoote commented Mar 6, 2020 •

edited

Loading

cottsay commented Mar 6, 2020

ci jobs are not cleaning up docker images #747

ci jobs are not cleaning up docker images #747

Comments

tfoote commented Mar 6, 2020 • edited Loading

cottsay commented Mar 6, 2020

tfoote commented Mar 6, 2020 •

edited

Loading