Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ECS connect container unhealthy during new deployments to ECS EC2 #91

Open
thiagoscodelerae opened this issue Sep 27, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@thiagoscodelerae
Copy link

Summary

ECS service connect random unhealthy during new deployment

Description

I have several ECS tasks running on EC2 using ECS service connect for internal communication. Sometimes, during new deployments, the ECS service connect container linked to these tasks becomes unhealthy, preventing the deployment from succeeding. This issue doesn't occur with every deployment.

These ECS tasks are GPU-based and take some time to start. I don't have any health check configured for the task definitions.
Capacity provider with auto scaling is configured to manage the EC2 instances.

Environment Details

docker info:

Client:
 Version:    25.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.0.0+unknown
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx

Server:
 Containers: 26  Running: 26
  Paused: 0
  Stopped: 0
 Images: 5
 Server Version: 25.0.6
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 nvidia
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fc6bcff51318944179630522a095cc9dbf9f353
 runc version: 58aa9203c123022138b22cf96540c284876a7910
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 4.14.352-267.564.amzn2.x86_64
 Operating System: Amazon Linux 2
 OSType: linux
 Architecture: x86_64
 CPUs: 192
 Total Memory: 1.457TiB
 Name: ip-10-0-2-132.us-west-2.compute.internal
 ID: 15f20bda-0351-45ab-ad53-c8d80dd58902
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

curl http://localhost:51678/v1/metadata:

{"Cluster":"my-ecs-cluster","ContainerInstanceArn":"arn:aws:ecs:us-east-2:0000000000:container-instance/my-ecs-cluster/cd84e75764231289b1bkd206fi72m258","Version":"Amazon ECSAgent - v1.86.3 (*78a2bf0c)"}[

df -h:

Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        747G     0  747G   0% /dev
tmpfs           747G  192K  747G   1% /dev/shm
tmpfs           747G  4.1M  747G   1% /run
tmpfs           747G     0  747G   0% /sys/fs/cgroup
/dev/nvme0n1p1  2.0T  180G  1.8T   9% /
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/75908095d9049666eeb194f2acdebed201a37464d6b736451988d9b7788c7e7f/merged
shm              64M     0   64M   0% /var/lib/docker/containers/d051c2ff3b372b802f8345415f44c9cb854dea42fdb117242eddf1b4cf28d131/mounts/shm
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/13354169b95381eb57886c3f57b66c3bdd7378d5e88674fe6c93be5df0341eae/merged
shm              64M     0   64M   0% /var/lib/docker/containers/c3cfb4a8e7c7ed26a7fe06cd07be006814a1a6754816554edc82ebfb4b964b70/mounts/shm
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/adba0c57bef371f0b59333d991fcd9ec03259cf66316a6f0a7840e271859573d/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/0a55bbb1de653ba266bef399838e16d419e53cc3959e349196c24a40e5bd93fb/merged
shm              64M     0   64M   0% /var/lib/docker/containers/8cd6dc2f806ab1b28a7abaa7faffcf192c64fadcbd20a4bcd38cd782cf447323/mounts/shm
shm              64M     0   64M   0% /var/lib/docker/containers/b2b805eaa6b03557ab51e8718e0e358a80899472b7fa1ea25696260fdf1bae28/mounts/shm
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/106e197ff6f8bca3186ffe79500f9d39118fe631a73227db4e9d9feda0c430fe/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/51ea09b880f730bd8401a02ba7e1ddcb1355e3a07217813678c9c6ab1d9b3916/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/3981bb1d5cd3cc3ceb9d93e7b8bac5ff3cd48a9a3dc5fe5111c6d0d9ddcb8c35/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/f0ff09d07dc268742c5cf822f25b4e2e808e927f970367bcd8891cfcf9edf845/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/205d2733fe0eaebefad382923d87724a7d030d3d4a97db462e3d5b43971c54b7/merged
shm              64M     0   64M   0% /var/lib/docker/containers/1b4c28b1c3c027f6b23257b079fc32502d8d7a5e9ad8a750b1c4b8d30fe2d359/mounts/shm
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/804329014a4e65d8fde5434bbe45524871bb85e680e3858eab410d0803fbfe55/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/58d54b86009dc496217d00c6ad6868af3fb7b6a94d8d91b0f8a7a3f133d45524/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/96023af81fb1c64109215f8b72dd502d36cbf833caeb8eff5daf4ab211711a84/merged
shm              64M     0   64M   0% /var/lib/docker/containers/594a961052567e09a1fb5045753a084b4694393c83f04093a51d70673adb8af7/mounts/shm
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/37fa80b308db379ef1c5f856ad7c3acd638723e8043a983b3cfd0cc783df4a48/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/ccca21342476e7748f59bc01ac62eb80ac240a3551af17acb8da5e8efb715b92/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/11929964a105c2563fc43162b8c533e38e380a7aaf4f246eb219c22ceed2aad7/merged
shm              64M     0   64M   0% /var/lib/docker/containers/615338fb533f8e6e5fcbd31faa28e9d214dec1d577ee194f30067c78bee928f4/mounts/shm
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/037114f77c35a836c3a404d0afa119fd60bbb17e7928bd76808a98d40af90838/merged
shm              64M     0   64M   0% /var/lib/docker/containers/b08883992f0c40a6b0a26cf3ea778979fab30de3129e569200e4b0b09ef268e1/mounts/shm
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/11158010fd59709c33f8aa3737f9b2bed99d88a8e71ccf1bff8572f13462994a/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/096723829a6e9422512296ec229c8390ac0be538240259cf73978d84342a9e1f/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/bc56d64d85f094938f9ece7484a06d9daae7ee95f39c61f16faf1a39bed6fc5a/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/28ab86ad867dc6f181b8470a500a8a131475902306b8e2845166724340b27cca/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/a5c164d50643790c9cf19bcf0c7f1a27ff6f446945ec8bc8ae0d8b6a65b6b0c8/merged
shm              64M     0   64M   0% /var/lib/docker/containers/2389eb4c023889900be4401b3386e64464354c71d220f8b38a32585777d43937/mounts/shm
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/be3a5bf1df6251f1738d322f483a8459d7bbfde8e8253037bbfc99bd5e4bd4e3/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/d4113d8f1f4850f5ff2ebf02832b2ad0a9108282ce1f55a07bd5598accc5eb3d/merged
shm              64M     0   64M   0% /var/lib/docker/containers/bf73f5d1f0a289b6c2b006d7e2c75324985773159ba1e1456925c60fb0176c36/mounts/shm
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/7a7896e24a77ff4cb9cec2043182c542ec740eec762f4cd92ec8ceda36b1c34c/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/4772f3fcda060c06dc89b20e5d56ccd8346577530a3a300a2609ab52df77618f/merged
overlay         2.0T  180G  1.8T   9% /var/lib/docker/overlay2/0b127fa447b2bca958b0f01cfa6cb5782bd2798466d7aef5c800d5fe45161ed7/merged
tmpfs           150G     0  150G   0% /run/user/0
@thiagoscodelerae thiagoscodelerae added the bug Something isn't working label Sep 27, 2024
@Happylinzy
Copy link
Contributor

Hi. Do you have any logs we can refer to?

@alessiogaldy
Copy link

I suspect this is a duplicate of #41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants