Synthetics - Bad request: unable to decode checkin request

Version: 8.18.2
Operating System: Ubuntu 24.04.2 LTS
Discuss Forum URL:
Steps to Reproduce:

We have 2 environments, Test & Production, in which we have the almost the same setup.
In each environment we have: 
- 3 elastic agents that act as the fleet server
- 3 elastic agent complete that are used for synthetics - both journeys and lightweight tests

In Test we have:
- a total of 1366 monitors
  - 187 synthetics
  - 1159 lightweight

In Production we have: 
- a total of 1595 monitors
  - 185 synthetics
  - 1410 lightweight

The issue that we are facing is that the elastic agent complete containers seem to lose connection to the fleet server containers after running for a couple minutes as healthy:
```
elastic-agent status
┌─ fleet
│  └─ status: (FAILED) status code: 400, fleet-server returned an error: BadRequest, message: Bad request: unable to decode checkin request
└─ elastic-agent
   └─ status: (HEALTHY) Running
```
```
cat Synthetics.stderr.359 | grep fleet
{"log.level":"warn","@timestamp":"2025-06-18T08:50:50.154Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/gateway/fleet.(*FleetGateway).doExecute","file.name":"fleet/fleet_gateway.go","file.line":196},"message":"Possible transient error during checkin with fleet-server, retrying","log":{"source":"elastic-agent"},"error":{"message":"status code: 400, fleet-server returned an error: BadRequest, message: Bad request: unable to decode checkin request"},"request_duration_ns":313344452,"failed_checkins":1,"retry_after_ns":61474208694,"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2025-06-18T08:51:52.285Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/gateway/fleet.(*FleetGateway).doExecute","file.name":"fleet/fleet_gateway.go","file.line":196},"message":"Possible transient error during checkin with fleet-server, retrying","log":{"source":"elastic-agent"},"error":{"message":"status code: 400, fleet-server returned an error: BadRequest, message: Bad request: unable to decode checkin request"},"request_duration_ns":323635649,"failed_checkins":2,"retry_after_ns":193698292377,"ecs.version":"1.6.0"}
```
This tends to happen only in Production. Because of this issue we are not able to push new monitors to those elastic agent complete agents, because they can't communicate with the fleet server.

The fix is to to an elastic-agent restart (inside the elastic-agent complete container), but after ~5 minutes the elastic-agent complete goes in this unhealthy state again.

Please let me know what extra logs are needed to help debug this issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Synthetics - Bad request: unable to decode checkin request #8577

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Synthetics - Bad request: unable to decode checkin request #8577

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions