fleet is no longer actively developed or maintained by CoreOS. CoreOS instead recommends Kubernetes for cluster orchestration.
fleet
is a cluster manager that controls systemd
at the cluster level. To run your services in the cluster, you must submit regular systemd units combined with a few fleet-specific properties.
If you're not familiar with systemd units, check out our Getting Started with systemd guide.
This guide assumes you're running fleetctl
locally from a Container Linux machine that's part of a Container Linux cluster. You can also control your cluster remotely. All of the units referenced in this blog post are contained in the unit-examples repository. You can clone this onto your Container Linux box to make unit submission easier.
Two types of units can be run in your cluster — standard and global units. Standard units are long-running processes that are scheduled onto a single machine. If that machine goes offline, the unit will be migrated onto a new machine and started.
Global units will be run on all machines in the cluster. These are ideal for common services like monitoring agents or components of higher-level orchestration systems like Kubernetes, Mesos or OpenStack. There are two fleetctl commands to view units in the cluster: list-unit-files
, which shows the units that fleet knows about and whether or not they are global, and list-units
, which shows the current state of units actively loaded into machines in the cluster. Here's an example cluster with 3 machines, running both types of units:
$ fleetctl list-unit-files
UNIT HASH DSTATE STATE TMACHINE
global-unit.service 8ff68b9 launched launched 3 of 3
standard-unit.service 7710e8a launched launched 148a18ff.../10.10.1.1
You can view all of the machines in the cluster by running list-machines
:
$ fleetctl list-machines
MACHINE IP METADATA
148a18ff-6e95-4cd8-92da-c9de9bb90d5a 10.10.1.1 -
491586a6-508f-4583-a71d-bfc4d146e996 10.10.1.2 -
c9de9451-6a6f-1d80-b7e6-46e996bfc4d1 10.10.1.3 -
Now when looking at the status of units, we should expect to see 3 copies of global-unit.service - one running on each machine:
$ fleetctl list-units
UNIT MACHINE ACTIVE SUB
global-unit.service 148a18ff.../10.10.1.1 active running
global-unit.service 491586a6.../10.10.1.2 active running
global-unit.service c9de9451.../10.10.1.3 active running
standard-unit.service 148a18ff.../10.10.1.1 active running
Running a single container is very easy. All you need to do is provide a regular unit file without an [Install]
section. Let's run the same unit from the Getting Started with systemd guide. First save these contents as myapp.service
on the Container Linux machine:
[Unit]
Description=MyApp
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill busybox1
ExecStartPre=-/usr/bin/docker rm busybox1
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run --name busybox1 busybox /bin/sh -c "trap 'exit 0' INT TERM; while true; do echo Hello World; sleep 1; done"
ExecStop=/usr/bin/docker stop busybox1
If you've been running docker commands manually, be sure you don't copy a docker run
command that starts a container in detached mode (-d
). Detached mode won't start the container as a child of the unit's pid. This will cause the unit to run for just a few seconds and then exit.
Run the start command to start up the container on the cluster:
$ fleetctl start myapp.service
The unit should have been scheduled to a machine in your cluster:
$ fleetctl list-units
UNIT MACHINE ACTIVE SUB
myapp.service c9de9451.../10.10.1.3 active running
You can view all of the machines in the cluster by running list-machines
:
$ fleetctl list-machines
MACHINE IP METADATA
148a18ff-6e95-4cd8-92da-c9de9bb90d5a 10.10.1.1 -
491586a6-508f-4583-a71d-bfc4d146e996 10.10.1.2 -
c9de9451-6a6f-1d80-b7e6-46e996bfc4d1 10.10.1.3 -
The main benefit of using Container Linux is to have your services run in a highly available manner. Let's walk through deploying a service that consists of two identical containers running the Apache web server. Afterwards, we'll walk through the failure of a machine and the steps fleet takes to relaunch our tasks on another machine.
First, let's write a unit file that we'll run two copies of. To do that, we'll use a template unit, named [email protected]
. The template stays on disk and is used as a base to generate two instances, named [email protected]
and [email protected]
:
[Unit]
Description=My Apache Frontend
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill apache1
ExecStartPre=-/usr/bin/docker rm apache1
ExecStartPre=/usr/bin/docker pull coreos/apache
ExecStart=/usr/bin/docker run --rm --name apache1 -p 80:80 coreos/apache /usr/sbin/apache2ctl -D FOREGROUND
ExecStop=/usr/bin/docker stop apache1
[X-Fleet]
Conflicts=apache@*.service
The Conflicts
attribute tells fleet
that these two services can't be run on the same machine, giving us high availability. A full list of options for this section can be found in the fleet units guide.
Let's start both units and verify that they're on two different machines:
$ fleetctl start apache@1
$ fleetctl start apache@2
$ fleetctl list-units
UNIT MACHINE ACTIVE SUB
myapp.service c9de9451.../10.10.1.3 active running
[email protected] 491586a6.../10.10.1.2 active running
[email protected] 148a18ff.../10.10.1.1 active running
As you can see, the Apache units are now running on two different machines in our cluster.
How do we route requests to these containers? The best strategy is to run a "sidekick" container that performs other duties that are related to our main container but shouldn't be directly built into that application. Examples of common sidekick containers are for service discovery and controlling external services such as cloud load balancers or DNS.
Machines in your fleet cluster are constantly in communication with the rest of cluster and elect a leader to make scheduling decisions. The leader is responsible for parsing newly submitted/started units, finding a qualified machine to run them (via X-Fleet parameters), and then informing the machine(s) to start the unit.
When a machine fails to heartbeat back to the fleet leader, all units running on that machine are marked for rescheduling. During that process, qualified machines are found for each unit and they are started on the new machine. Units that can't be rescheduled will remain stopped until a qualified machine can be found. If the failed machine recovers, the fleet leader will tell it to cease operations of the old units, which have been rescheduled, and then the machine will be available for new work.
You can test out this process by stopping fleet (sudo systemctl stop fleet
) on one of the machines running our Apache unit. The fleet logs (sudo journalctl -u fleet
) will provide more clarity on what's going on under the hood.
The simplest sidekick example is for service discovery. This unit blindly announces that our container has been started. We'll run one of these for each Apache unit that's already running. Again, we'll use a template unit with two instances. Make a template unit called [email protected]
.
[Unit]
Description=Announce Apache1
BindsTo=apache@%i.service
After=apache@%i.service
[Service]
ExecStart=/bin/sh -c "while true; do etcdctl set /services/website/apache@%i '{ \"host\": \"%H\", \"port\": 80, \"version\": \"52c7248a14\" }' --ttl 60;sleep 45;done"
ExecStop=/usr/bin/etcdctl rm /services/website/apache@%i
[X-Fleet]
MachineOf=apache@%i.service
This unit has a few interesting properties. First, it uses BindsTo
to link the unit to our apache@%i.service
unit. When the Apache unit is stopped, this unit will stop as well, causing it to be removed from our /services/website
directory in etcd
. A TTL of 60 seconds is also being used here to remove the unit from the directory if our machine suddenly died for some reason.
Second is %i
, a variable built into systemd that represents the instance name of an instantiated unit (a unit launched from a template). This variable expands to any value after the @
in the unit's name. In our case, it will expand to 1
(for apache-discovery@1
) and 2
(for apache-discovery@2
).
Third is %H
, a variable built into systemd, that represents the hostname of the machine running this unit. Variable usage is covered in our Getting Started with systemd guide as well as in systemd documentation.
The fourth is a fleet-specific property called MachineOf
. This property causes the unit to be placed onto the same machine that the corresponding apache service is running on (e.g., [email protected]
will be scheduled on the same machine as [email protected]
).
Let's verify that each unit was placed on to the same machine as the Apache service is bound to:
$ fleetctl start apache-discovery@1
$ fleetctl start apache-discovery@2
$ fleetctl list-units
UNIT MACHINE ACTIVE SUB
myapp.service c9de9451.../10.10.1.3 active running
[email protected] 491586a6.../10.10.1.2 active running
[email protected] 148a18ff.../10.10.1.1 active running
[email protected] 491586a6.../10.10.1.2 active running
[email protected] 148a18ff.../10.10.1.1 active running
Now let's verify that the service discovery is working correctly:
$ etcdctl ls /services/ --recursive
/services/website
/services/website/apache@1
/services/website/apache@2
$ etcdctl get /services/website/apache@1
{ "host": "ip-10-182-139-116", "port": 80, "version": "52c7248a14" }
If you're running in the cloud, many services have APIs that can be automated based on actions in the cluster. For example, you may update DNS records or add new containers to a cloud load balancer. Our Example Deployment with fleet contains a pre-made presence container that updates an Amazon Elastic Load Balancer with new backends.
<iframe width="636" height="375" src="//www.youtube.com/embed/u91DnN-yaJ8?rel=0" frameborder="0" allowfullscreen></iframe>As mentioned earlier, global units are useful for running a unit across all of the machines in your cluster. It doesn't differ very much from a regular unit other than a new X-Fleet
parameter called Global=true
. Here's an example unit from a blog post to use Data Dog with Container Linux. You'll need to set an etcd key ddapikey
before this example will work — more details are in the post.
[Unit]
Description=Monitoring Service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill dd-agent
ExecStartPre=-/usr/bin/docker rm dd-agent
ExecStartPre=/usr/bin/docker pull datadog/docker-dd-agent
ExecStart=/usr/bin/bash -c \
"/usr/bin/docker run --privileged --name dd-agent -h `hostname` \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /proc/mounts:/host/proc/mounts:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e API_KEY=`etcdctl get /ddapikey` \
datadog/docker-dd-agent"
[X-Fleet]
Global=true
If we start this unit, it should be running on all 3 of our machines:
$ fleetctl start datadog.service
$ fleetctl list-units
UNIT MACHINE ACTIVE SUB
myapp.service c9de9451.../10.10.1.3 active running
[email protected] 491586a6.../10.10.1.2 active running
[email protected] 148a18ff.../10.10.1.1 active running
[email protected] 491586a6.../10.10.1.2 active running
[email protected] 148a18ff.../10.10.1.1 active running
datadog.service 148a18ff.../10.10.1.1 active running
datadog.service 491586a6.../10.10.1.2 active running
datadog.service c9de9451.../10.10.1.3 active running
Global units can deployed to a subset of matching machines with the MachineMetadata
parameter, which is explained in the next section.
Applications with complex and specific requirements can target a subset of the cluster for scheduling via machine metadata. Powerful deployment topologies can be achieved — schedule units based on the machine's region, rack location, disk speed or anything else you can think of.
Metadata can be provided via a Container Linux Config or a config file. Here's an example config file:
# Comma-delimited key/value pairs that are published to the fleet registry.
# This data can be referenced in unit files to affect scheduling decisions.
# An example could look like: metadata="region=us-west,az=us-west-1"
metadata="platform=metal,provider=rackspace,region=east,disk=ssd"
Metadata can be viewed in the machine list when configured:
$ fleetctl list-machines
MACHINE IP METADATA
29db5063... 172.17.8.101 disk=ssd,platform=metal,provider=rackspace,region=east
ebb97ff7... 172.17.8.102 disk=ssd,platform=cloud,provider=rackspace,region=east
f823e019... 172.17.8.103 disk=ssd,platform=cloud,provider=amazon,region=east
The unit file for a service that does a lot of disk I/O but doesn't care where it runs could look like:
[X-Fleet]
MachineMetadata=disk=ssd
If you wanted to ensure very high availability you could have 3 unit files that must be scheduled across providers but in the same region:
[X-Fleet]
Conflicts=webapp*
MachineMetadata=provider=rackspace
MachineMetadata=platform=metal
MachineMetadata=region=east
[X-Fleet]
Conflicts=webapp*
MachineMetadata=provider=rackspace
MachineMetadata=platform=cloud
MachineMetadata=region=east
[X-Fleet]
Conflicts=webapp*
MachineMetadata=provider=amazon
MachineMetadata=platform=cloud
MachineMetadata=region=east
On Container Linux, only fleet 0.11.x is available under /usr/bin. That one might be obsolete for users who want to try out more recent versions. In that case, users should define their own custom Container Linux Config to be able to run any version of fleet as they want. For example:
storage:
files:
- path: /opt/bin/fleet-wrapper
filesystem: root
mode: 0755
contents:
remote:
url: https://raw.githubusercontent.com/coreos/fleet/master/scripts/fleet-wrapper
systemd:
units:
- name: fleet.service
enable: true
contents: |
[Unit]
After=etcd2.service etcd-member.service
Wants=network.target fleet.socket
Requires=etcd2.service
[Service]
Type=notify
Restart=always
RestartSec=10s
LimitNOFILE=40000
TimeoutStartSec=0
ExecStartPre=/usr/bin/mkdir --parents /etc/fleet /run/dbus /run/fleet/units
ExecStartPre=/usr/bin/rkt trust --prefix "quay.io/coreos/fleet" --skip-fingerprint-review
ExecStart=/opt/bin/fleet-wrapper
[Install]
WantedBy=multi-user.target
Example Deployment with fleet fleet Unit Specifications fleet Configuration