The only Docker CE setup guide you will ever need
Kubernetes clusters are an awesome piece of technology, if an highly automated and dynamic environment for tens or hundrets of applications is required.
But Kubernetes brings also huge complexity which is not always necessary. Not every project scales to the size of Netflix or Google, so an good old Docker CE can also be an option.
This guide is a collection of custom configurations to improve the security, stability and isolation of Docker CE servers and it’s containers.
Scope
- This is not a beginner guide
- All operating system settings are based on Debian Bookworm
- Single Docker CE servers, no Docker Swarm
- Some parts of this document cover IPv6, most don’t
Operating System and Docker Daemon
Configuration for the Docker CE Daemon and the underlying operating system.
Installing Docker CE Daemon
Debian brings it’s own packages for Docker CE, but it is recommended to use the official package sources from Docker.
- Make sure that all Debian provided Debian packages are uninstalled
- Follow the instructions for installing Docker CE using the apt repository
When Apt is done, docker ps
should output an empty container list.
Docker API access
The Docker CE Engine is controlled with an HTTP RESTful API accessible using the Unix Socket
/var/run/docker.sock
. Whoever has access to that socket can run any action/query on the Docker CE Engine.
Example:
curl --unix-socket /var/run/docker.sock http://localhost/images/json | jq
(Shorted) Output:
[{
"Created": 1699617762,
"Id": "sha256:3f32ac3a3d6f2c6778eb8ddb5924264e89c3673175e057caab233961b6eb3140",
"RepoDigests": [ "homeassistant/home-assistant@sha256:400f20c77f52ac31334c1e73a2f19b2d6e5820757d1d476f01960b1efed31949" ],
"RepoTags": [ "homeassistant/home-assistant:latest" ],
"Size": 1903993749,
}]
Per default root
and all members of the Unix group docker
can read/write to the socket.
Docker and privilege escalation
As mentioned in the last chapter, any Unix account can get access to Docker CE by being member
of the group docker
. But be aware, that all users which can create containers will have implicit
root access to the whole server system.
There are alot examples how host access can be done via docker. As one example, a combination of a privileged container and chroot:
# mount the hosts root partition into the container
# use the hosts network scope
# --privileged allows kernel access
# chroot into the hosts root partition
# launch a interactive bash
docker run -it --rm -v /:/mnt/host --net=host --privileged debian:latest chroot /mnt/host /bin/bash -i
If this is no/low risk for you, access to Docker CE for a user can be granted like so:
gpasswd -a christian docker
Docker CLI Plugins
The docker
CLI command supports plugins, the most used one should be docker compose
, which is also available
as dedicated CLI command.
On debian it can be installed like so:
apt install docker-compose-plugin docker-buildx-plugin
In a Dockerfile it can be installed like so:
FROM docker:latest
COPY --from=docker/buildx-bin:latest /buildx /usr/libexec/docker/cli-plugins/docker-buildx
COPY --from=docker/compose-bin:latest /docker-compose /usr/libexec/docker/cli-plugins/docker-compose
The plugins can be used like so:
docker compose --help
docker buildx --help
Dedicated Disk for Dockers Data Directory
Note:
This chapter assumes that there is already a unformatted partition available and that this partition has the device name/dev/sda2
or /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2
.
To have better control of the disk space consumtion, it makes sense to move the Docker CE data directory to a dedicated partition. If the partition has a XFS filesystem, it’s even possible to use disk quotas in docker volumes and containers.
Install dependencies, format partition, move data:
# preparations
apt install xfsprogs
systemctl stop docker
# format and mount partition
mkfs.xfs /dev/sda2
mkdir /mnt/dockerdata
echo "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2 /mnt/dockerdata xfs discard,nofail,pquota,defaults 0 0" >> /etc/fstab
mount /mnt/dockerdata
# check if partition is mounted
mount | grep dockerdata
# move data
mv /var/lib/docker /mnt/dockerdata/docker
Add data-root
setting in /etc/docker/daemon.json
:
{
"data-root": "/mnt/dockerdata/docker"
}
Start docker again and verify if it has started correctly:
systemctl start docker
systemctl status docker
docker info | grep "Docker Root Dir:"
docker ps -a
Limit Log Filesize
Depending of the container, a log file can grow very fast. To prevent running out of disk space, the size and number of log files per container should be limited.
Add log driver options to /etc/docker/daemon.json
:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "32m",
"max-file": "5"
}
}
Restart Docker CE:
systemctl restart docker
Warning!
All containers needs to be redeployed, as the log driver settings are set per-container. The setting fromdaemon.json
is just used as a default for each container deployment.
Log Drivers
There are also alternative log drivers available in Docker CE. A good option is for example, to use the
journald
driver to log into systemd journald.
Logs can be accessed then like so:
journalctl CONTAINER_NAME=webserver
This also allows it to forward docker logs including all system log messages to applications like Graylog via filebeat.
Details:
All Log Drivers
Journald log driver documentation
Filebeat Documentation
Container Quotas
All containers should have quotas set to prevent, that a faulty process is affecting other containers or even let the whole server system crash.
CPU Quotas
CPU Quotas are supported out of the box.
See Run containers for examples.
Memory + Swap Quotas
Memory quotas are supported per default, for swap quotas a kernel parameter needs to be set:
In /etc/default/grub
:
GRUB_CMDLINE_LINUX_DEFAULT="quiet cgroup_enable=memory swapaccount=1"
Update bootloader and reboot:
update-grub
reboot
See Run containers for examples.
Disk Quotas
Disk quotas are supported, if the Dedicated Disk for Dockers Data Directory chapter is applied correctly.
See Run containers for examples.
Networking
Connecting Docker Containers to each other and to the outside world.
Network Address Pools
Per default Docker CE is using the subnet 172.17.0.0/16
to assign IP addresses to containers and Docker
networks. Sometimes, this network is already in use by other services. This can result in the unavailability
of these services from the Docker server system and it’s containers.
Add the default-address-pools
option to /etc/docker/daemon.json
:
{
"default-address-pools": [
{ "base": "172.19.0.0/16", "size": 24 }
],
}
Now Docker CE will create subnets with size size /24
from 172.19.0.0/16
for new Docker networks.
Warning!
Containers and Docker networks needs to be recreated to activate this change.Local Bridge Networks
The default way to connect docker containers with each other are bridge networks. When connected to a network, the container can communicate with each other with it’s container names.
docker network create -o com.docker.network.bridge.name=somenetwork somenetwork
docker run -d --name backend --network somenetwork debian:latest sleep infinity
docker run -d --rm --name frontend --network somenetwork debian:latest \
/bin/bash -c "apt update && apt install --yes iputils-ping && ping backend"
The option -o com.docker.network.bridge.name
allows it to set the name of the Linux network
interface. Which can be helpful to keep the overview. The name has a maximum length of 15 characters.
IPv6 ULA and Bridge Networks
Docker CE is able to use IPv6 ULA (Unique Local Address) ranges and will create SNAT rules for the outgoing communication. This is really helpful, if the server only has a single IPv4 Address. IPv4 and IPv6 can be used and configured exactly identical.
{
"ipv6": true,
"experimental": true,
"ip6tables": true,
"fixed-cidr-v6": "fd00::/49",
"default-address-pools": [
{ "base": "fd00:0:0:8000::/49", "size": 64 },
{ "base": "172.19.0.0/16", "size": 24 }
]
}
The IPv6 ranges are not routed and can be used as-is on multiple servers.
docker network create --ipv6 -o com.docker.network.bridge.name=somenetwork somenetwork
docker run -d --name backend --network somenetwork debian:latest \
/bin/bash -c "apt update && apt install --yes iputils-ping && ping6 google.com"
Incoming traffic should be handled with normal port publishings through a reverse proxy like NGINX or Traefik.
Warning!
Containers and Docker networks needs to be recreated to activate this change.Routed Bridge Networks
It is also possible to assign real LAN IP addresses directly to containers with the macvlan
network driver.
The following network is attached to the server network interface eth0
,
which provides the LAN subnet 192.168.32.0/24
.
Containers connected to the network will have IP addresses from the range 192.168.32.128/25
assigned.
Services inside of this containers, are directly accessible from the server network without any port
publishings (-p
).
docker network create -d macvlan \
--subnet=192.168.32.0/24 \
--ip-range=192.168.32.128/25 \
--gateway=192.168.32.254 \
-o parent=ens18 publicservices
Note:
For high thoughput applications with huge amounts of connections, this can improve the performance significantly, as there is no NAT necessary. See also this article.Set outgoing IP Address by Docker Network
For outgoing communication, all containers in a bridge
driver network using the primary IP address of
the server system. This can be a problem, if every application on the docker host should have invidual
firewall rules to allow communication with other servers and services.
An individual outgoing IP address can be defined, by managing the NAT rules ourselves.
Create a Docker network with disabled masquerading:
docker network create -d bridge \
-o com.docker.network.bridge.name=appnetwork \
-o com.docker.network.bridge.enable_ip_masquerade=false \
appnetwork
Then the following nftables rule needs to be applied:
table ip docker_network_appnetwork
delete table ip docker_network_appnetwork
table ip docker_network_appnetwork {
chain postrouting {
type nat hook postrouting priority srcnat; policy accept;
ip saddr 172.19.4.0/24 oifname != "appnetwork" snat to 192.168.99.226
}
}
172.19.4.0/24
is the subnet which was assigned to the Docker network, appnetwork
the interface name and
192.168.99.226
the IP address used for the outgoing connection.
Warning!
The outgoing IP address must exists on the server interface.On macvlan
networks, it is alot easier, as a NAT rule is not necessary:
docker network create -d macvlan \
--subnet=192.168.32.0/24 \
--ip-range=192.168.32.128/25 \
--gateway=192.168.32.254 \
-o parent=ens18 \
-o com.docker.network.bridge.enable_ip_masquerade=false \
-o com.docker.network.bridge.name=appnetwork \
appnetwork
Internal Transit Docker Networks
When two applications which both using custom outgoint IP addresses are required to be connected to each other, a transit network must be used. Otherwise one of the applications will use the outgoing IP address of the other application, as it is not possible to control the routes inside of the containers.
docker network create -d bridge --internal transitnetwork
The --internal
option denies outgoing communication with that network. Only containers can communicate with
each other, but there is no LAN or internet access possible.
NAT and high traffic applications
When an application is processing thousands of connections per second, the default NATing of Docker can become the bottleneck. Multiple connection trying to use the same port number for the NATing, which causes retries. Because of that the initial connect can take multiple seconds.
The problem is described here and here more detailed.
Besides using macvlan
networks, another option is to use the server host network stack:
# create a network with a known IP Subnet
docker network create --driver=bridge --subnet=10.0.0.0/24 containers0
# connect the existing application container with the network
# and assign a static IP address
docker network connect --ip 10.0.0.101 containers0 dingetun
docker network connect --ip 10.0.0.102 containers0 myip
# create the new webserver container
docker run --name webserver -d \
--add-host dingetun:10.0.0.101 \
--add-host myip:10.0.0.102 \
--network host \
nginx:latest
In this example the webserver container is directly attached to the network stack of the server host. Port
80/tcp
and 443/tcp
are directly bound to the public IP of the server system.
Because of that, the container is unable to use Docker networks and the Docker DNS. This needs to workarounded
with assigning static IP addresses to all other containers and also add static /etc/hosts
entries to
the webserver container.
With this, NAT is completly avoided, but other containers can still accessed by it’s name.
Build Containers
Using the caching mechanics of Docker builds more effective.
Cache packages
Docker checks for changes on all used files before (re)building an image layer. If there are no changes, the layer from the previous build is used instead of building a new one.
This can be used to cache a npm install
or a nuget restore
run, even if the code of the app
itself was changed. Just copy the package.json
and package-lock.json
first into the image,
run the npm install
and copy the rest of the application afterwards.
As long as this two files have no changes, this build step is cached and will safe multiple minutes in future CI Pipeline runs.
FROM node:latest as packages
WORKDIR /src
COPY package.json package.json
COPY package-lock.json package-lock.json
RUN npm install
FROM packages as app
WORKDIR /src
COPY . .
RUN npm run-script build
BuildX and inline caches
The “new” docker buildx
includes cache metadata into the container image, which allows it to use an previous
image as a cache source. This makes the cache available even when two builds are done on different servers.
docker buildx build \
--cache-from "example.com/myapp/myapp:latest" \
--cache-to type=inline \
-t "example.com/myapp/myapp:latest" .
Also this enables the use of caching in disposable build environments like GitLab runners with Docker-in-Docker.
Labels in images
Labels can be defined at build time and at runtime. Adding informations like build timestamp, version, maintainer or application name helps to keep an overview about the applications which are running on the Docker system.
The Open Containers Initiative has defined a standard to add metadata to a container image at build time. The format description can be found here.
Dockerfile labels:
LABEL org.opencontainers.image.created=$DATE
LABEL org.opencontainers.image.version=1.0.0
LABEL org.opencontainers.image.authors=Christian
LABEL org.opencontainers.image.url=https://serverless.industries
Build command:
docker buildx build \
--label "org.opencontainers.image.created=$(date --iso-8601=seconds)" \
--label "org.opencontainers.image.version=1.0.0" \
--label "org.opencontainers.image.authors=Christian" \
--label "org.opencontainers.image.url=https://serverless.industries" \
-t "example.com/myapp/myapp:latest" .
Run Containers
Container Quotas
The following example demonstrates all quota options:
docker volume create -o size=1G persistent-volume
docker run -d --name cpu-demo \
--cpus 1.5 \
--memory 2GB \
--memory-swap 3GB \
--storage-opt size=1G \
-v persistent-volume:/mnt/data \
debian:latest
- Assign 1.5 CPUs
- Assign 2GB of memory
- Assign 1GB of swap
- Assign 1GB of temporary disk space
- Mount a volume with 1GB of persistent storage at
/mnt/data
Warning!
Storage quotas on folder-to-folder bind mounts are not supported.Name + Hostname
In addition to the container name, the container hostname should also be set as the hostname will appear in log files.
docker run -d \
--name webserver \
--hostname webserver \
-p 80:80 \
nginx:latest
Labels on containers
Labels can also be set when creating containers. For example to group containers by project or adding a responsible person to the container. Also projects like Traefik, Watchtower or Ofelia using labels for configuration and container discovery.
docker run -d \
--name webserver \
--label "applicationowner=John Doe" \
--label "project=some-multi-container-project" \
-p 80:80 \
nginx:latest