Docker and Containers - From Fundamentals to Production
Docker and Containers - From Fundamentals to Production
A practical guide covering container internals, Docker architecture, networking, storage, and production-grade practices. This isn’t a beginner tutorial - we’ll go deep into how things actually work.
Chapter 1: Container Fundamentals
What Are Containers, Really?
Containers are isolated processes running on a shared kernel. They’re not lightweight VMs - they’re a clever use of Linux kernel features to create isolated environments.
Containers vs Virtual Machines
┌─────────────────────────────────────────────────────────────────────┐
│ VIRTUAL MACHINES │
├─────────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ App A │ │ App B │ │ App C │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ Bins/Libs │ │ Bins/Libs │ │ Bins/Libs │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ Guest OS │ │ Guest OS │ │ Guest OS │ ← Full OS each │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ HYPERVISOR (VMware, KVM) │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ HOST OS │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ HARDWARE │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ CONTAINERS │
├─────────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ App A │ │ App B │ │ App C │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ Bins/Libs │ │ Bins/Libs │ │ Bins/Libs │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ CONTAINER RUNTIME (Docker) │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ HOST OS (Shared Kernel) │ ← One kernel │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ HARDWARE │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
| Aspect | Virtual Machines | Containers |
|---|---|---|
| Isolation Level | Hardware-level (hypervisor) | Process-level (kernel) |
| Boot Time | Minutes | Milliseconds to seconds |
| Size | GBs (full OS) | MBs (just app + deps) |
| Resource Overhead | High (each VM runs full OS) | Low (shared kernel) |
| Density | ~10-20 VMs per host | ~100s of containers per host |
| Security Isolation | Stronger (separate kernels) | Weaker (shared kernel) |
| Use Case | Multi-tenancy, different OS | Microservices, CI/CD |
The Linux Kernel Features Behind Containers
Containers leverage three key kernel features:
1. Namespaces - Isolation of system resources
# View namespaces for a process
ls -la /proc/$$/ns/
# Types of namespaces:
# - pid : Process IDs (container sees its own PID 1)
# - net : Network interfaces, routing tables
# - mnt : Mount points (filesystem)
# - uts : Hostname and domain name
# - ipc : Inter-process communication
# - user : User and group IDs
# - cgroup : Control group root directory
2. Control Groups (cgroups) - Resource limits
# View cgroup limits for a container
cat /sys/fs/cgroup/memory/docker/<container-id>/memory.limit_in_bytes
cat /sys/fs/cgroup/cpu/docker/<container-id>/cpu.shares
# cgroups control:
# - Memory limits
# - CPU shares/quotas
# - Block I/O
# - Network bandwidth (with tc)
3. Union Filesystems - Layered storage
# Layers are stacked - each instruction in Dockerfile creates a layer
# Read-only layers + thin writable layer on top = container filesystem
Docker Architecture
Understanding Docker’s components helps when debugging issues.
┌─────────────────────────────────────────────────────────────────────┐
│ DOCKER CLIENT │
│ (docker CLI, Docker API) │
│ │ │
│ │ REST API │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ DOCKER DAEMON (dockerd) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ Images │ │ Containers │ │ Networks │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ Volumes │ │ Plugins │ │ Build Cache │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │
│ │ │ │ │
│ └──────────────────────────────┼─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ CONTAINERD (container runtime) │ │
│ │ │ │ │
│ └──────────────────────────────┼─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ RUNC (OCI runtime) │ │
│ │ Creates actual containers │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ LINUX KERNEL │ │
│ │ (namespaces, cgroups, union fs) │ │
│ └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Component Breakdown
| Component | Purpose |
|---|---|
| Docker CLI | User interface - sends commands to daemon |
| Docker Daemon | Background service managing images, containers, networks, volumes |
| containerd | Industry-standard container runtime (manages container lifecycle) |
| runc | Low-level runtime that actually creates containers using kernel features |
| Registry | Stores and distributes images (Docker Hub, private registries) |
# Check Docker system info
docker info
# Check component versions
docker version
# Check daemon status
systemctl status docker
Images and Layers
Images are read-only templates made of stacked layers. Each layer represents a Dockerfile instruction.
How Layers Work
┌─────────────────────────────────────────────┐
│ CONTAINER (Running Instance) │
├─────────────────────────────────────────────┤
│ Thin Read-Write Layer (Container Layer) │ ← Changes go here
├─────────────────────────────────────────────┤
│ │
│ IMAGE LAYERS │
│ (Read-Only, Shared) │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ Layer 4: COPY app.py /app │ │ ← Your code
│ ├────────────────────────────────────────┤ │
│ │ Layer 3: RUN pip install flask │ │ ← Dependencies
│ ├────────────────────────────────────────┤ │
│ │ Layer 2: RUN apt-get update && install │ │ ← System packages
│ ├────────────────────────────────────────┤ │
│ │ Layer 1: Base Image (python:3.11-slim) │ │ ← Base OS + runtime
│ └────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────┘
Inspect Image Layers
# View image history (each layer)
docker history python:3.11-slim
# Detailed layer info
docker inspect python:3.11-slim --format '' | jq
# See actual layer sizes
docker history --no-trunc python:3.11-slim
Why Layers Matter
- Caching - Unchanged layers are cached, speeding up builds
- Sharing - Multiple containers can share base layers (saves disk space)
- Distribution - Only changed layers need to be pushed/pulled
# Force rebuild without cache
docker build --no-cache -t myapp .
# Build with specific cache settings
docker build --build-arg CACHEBUST=$(date +%s) -t myapp .
Container Lifecycle
docker create
│
▼
┌──────────┐ ┌──────────┐ docker start ┌──────────┐
│ Image │ ────► │ Created │ ─────────────────────────► │ Running │
└──────────┘ └──────────┘ └────┬─────┘
│
┌───────────────────────────────────────┤
│ │
│ docker stop │ docker pause
▼ ▼
┌──────────┐ ┌──────────┐
│ Exited │ │ Paused │
└────┬─────┘ └──────────┘
│
│ docker rm
▼
┌──────────┐
│ Removed │
└──────────┘
Essential Container Commands
# Create and start (most common)
docker run -d --name myapp nginx
# Just create (don't start)
docker create --name myapp nginx
# Start existing container
docker start myapp
# Stop gracefully (SIGTERM, then SIGKILL after timeout)
docker stop myapp
# Stop immediately (SIGKILL)
docker kill myapp
# Pause (freeze processes with SIGSTOP)
docker pause myapp
docker unpause myapp
# Remove container
docker rm myapp
# Remove running container
docker rm -f myapp
# Remove all stopped containers
docker container prune
Container Inspection
# List running containers
docker ps
# List all containers (including stopped)
docker ps -a
# Detailed container info
docker inspect myapp
# Get specific info
docker inspect myapp --format '{{.State.Status}}'
docker inspect myapp --format '{{.NetworkSettings.IPAddress}}'
docker inspect myapp --format '{{json .Mounts}}' | jq
# Resource usage
docker stats myapp
# Processes inside container
docker top myapp
Docker Networking Deep Dive
Docker networking is critical to understand - it’s where most issues occur.
Network Drivers
┌─────────────────────────────────────────────────────────────────────┐
│ DOCKER NETWORK DRIVERS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ BRIDGE (default) │ │
│ │ • Private internal network on the host │ │
│ │ • Containers can communicate via IP or container name │ │
│ │ • Need port mapping (-p) for external access │ │
│ │ • Best for: Single-host container communication │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ HOST │ │
│ │ • Container uses host's network stack directly │ │
│ │ • No network isolation │ │
│ │ • No port mapping needed (container binds to host ports) │ │
│ │ • Best for: Performance-critical apps, network tools │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ NONE │ │
│ │ • No networking at all │ │
│ │ • Container is completely isolated │ │
│ │ • Best for: Batch jobs, security-sensitive workloads │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ OVERLAY │ │
│ │ • Multi-host networking (Docker Swarm, Kubernetes) │ │
│ │ • Containers on different hosts can communicate │ │
│ │ • Uses VXLAN encapsulation │ │
│ │ • Best for: Distributed applications, orchestration │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ MACVLAN │ │
│ │ • Container gets its own MAC address │ │
│ │ • Appears as physical device on network │ │
│ │ • Direct L2 connectivity │ │
│ │ • Best for: Legacy apps that need direct network access │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Bridge Network (Default)
┌─────────────────────────────────────────────────────────────────────┐
│ HOST MACHINE │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ docker0 bridge │ │
│ │ (172.17.0.1) │ │
│ │ │ │ │
│ │ ┌───────────────┼───────────────┐ │ │
│ │ │ │ │ │ │
│ │ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ │
│ │ │ veth │ │ veth │ │ veth │ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ └─────────┼──────────────┼──────────────┼───────────────────┘ │
│ │ │ │ │
│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │
│ │Container│ │Container│ │Container│ │
│ │ A │ │ B │ │ C │ │
│ │.17.0.2 │ │.17.0.3 │ │.17.0.4 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ eth0 (Host NIC) ─────────── Internet │
│ NAT (iptables) │
└─────────────────────────────────────────────────────────────────────┘
Network Commands
# List networks
docker network ls
# Create custom bridge network
docker network create --driver bridge my-network
# Create network with specific subnet
docker network create \
--driver bridge \
--subnet 192.168.100.0/24 \
--gateway 192.168.100.1 \
my-custom-network
# Connect container to network
docker network connect my-network mycontainer
# Disconnect container from network
docker network disconnect my-network mycontainer
# Inspect network
docker network inspect my-network
# Remove network
docker network rm my-network
# Remove unused networks
docker network prune
Port Mapping
# Map host port 8080 to container port 80
docker run -p 8080:80 nginx
# Map to specific interface
docker run -p 127.0.0.1:8080:80 nginx
# Map random host port
docker run -p 80 nginx
docker port <container> # See assigned port
# Map UDP port
docker run -p 53:53/udp dns-server
# Map multiple ports
docker run -p 80:80 -p 443:443 nginx
Container DNS and Service Discovery
# On custom networks, containers can reach each other by name
docker network create app-network
docker run -d --name db --network app-network postgres
docker run -d --name api --network app-network myapi
# From 'api' container, can reach postgres at hostname 'db'
docker exec api ping db # Works!
# On default bridge network, must use IP addresses
# Container names don't resolve on default bridge
Network Debugging
# Check container's network settings
docker inspect mycontainer --format '{{json .NetworkSettings}}' | jq
# Get container IP
docker inspect mycontainer --format '{{.NetworkSettings.IPAddress}}'
# Check what ports are exposed
docker inspect mycontainer --format '{{json .NetworkSettings.Ports}}' | jq
# Test connectivity from inside container
docker exec mycontainer ping google.com
docker exec mycontainer curl -v http://other-container:8080
# Check host iptables rules (port forwarding)
sudo iptables -t nat -L -n -v
# Check if port is listening
docker exec mycontainer netstat -tlnp
docker exec mycontainer ss -tlnp
Network Comparison Table
| Driver | Isolation | Multi-Host | Port Mapping | Use Case |
|---|---|---|---|---|
| bridge | Yes | No | Required | Default, single-host |
| host | No | No | Not needed | Performance, network tools |
| none | Complete | No | N/A | Security, offline tasks |
| overlay | Yes | Yes | Optional | Swarm/K8s clusters |
| macvlan | Yes | No | Not needed | Direct LAN access |
Docker Volumes and Storage
Containers are ephemeral - when they’re removed, their data is gone. Volumes solve this.
Storage Types
┌─────────────────────────────────────────────────────────────────────┐
│ DOCKER STORAGE OPTIONS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ VOLUMES (Recommended) │ │
│ │ • Managed by Docker (/var/lib/docker/volumes/) │ │
│ │ • Best for persistent data │ │
│ │ • Can be shared between containers │ │
│ │ • Works on Linux, macOS, Windows │ │
│ │ • Supports volume drivers (NFS, cloud storage) │ │
│ │ │ │
│ │ docker run -v myvolume:/data nginx │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ BIND MOUNTS │ │
│ │ • Maps host directory to container path │ │
│ │ • Good for development (code sync) │ │
│ │ • Host path must exist │ │
│ │ • Performance varies by host OS │ │
│ │ │ │
│ │ docker run -v /host/path:/container/path nginx │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ TMPFS MOUNTS │ │
│ │ • Stored in host memory only │ │
│ │ • Never written to host filesystem │ │
│ │ • Fast, but data lost on container stop │ │
│ │ • Good for sensitive data, caches │ │
│ │ │ │
│ │ docker run --tmpfs /app/cache nginx │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Volume Commands
# Create a volume
docker volume create mydata
# List volumes
docker volume ls
# Inspect volume
docker volume inspect mydata
# Use volume in container
docker run -v mydata:/app/data myapp
# Remove volume
docker volume rm mydata
# Remove unused volumes
docker volume prune
# Remove ALL unused volumes (careful!)
docker volume prune -a
Bind Mounts vs Volumes
# VOLUME - Docker manages the location
docker run -v myvolume:/data nginx
# Data stored at /var/lib/docker/volumes/myvolume/_data
# BIND MOUNT - You specify exact host path
docker run -v /home/user/data:/data nginx
# Data stored at /home/user/data
# Modern syntax (--mount) - more explicit
docker run --mount type=volume,source=myvolume,target=/data nginx
docker run --mount type=bind,source=/home/user/data,target=/data nginx
Volume Use Cases
# Database persistence
docker run -d \
--name postgres \
-v pgdata:/var/lib/postgresql/data \
-e POSTGRES_PASSWORD=secret \
postgres
# Share data between containers
docker run -d --name writer -v shared:/data alpine sh -c "while true; do date >> /data/log.txt; sleep 1; done"
docker run -d --name reader -v shared:/data:ro alpine tail -f /data/log.txt
# Development with live reload (bind mount)
docker run -d \
-v $(pwd)/src:/app/src \
-p 3000:3000 \
node-dev
# Read-only mount (security)
docker run -v myconfig:/etc/app/config:ro myapp
Storage Debugging
# See what's using disk space
docker system df
docker system df -v # Verbose
# Find where volume data is stored
docker volume inspect myvolume --format '{{.Mountpoint}}'
# Check mount points inside container
docker exec mycontainer df -h
docker exec mycontainer mount | grep /data
Docker Compose Fundamentals
Compose defines multi-container applications in a single YAML file. Essential for local development.
Basic Structure
# docker-compose.yml
version: '3.8'
services:
web:
build: ./web
ports:
- "8080:80"
depends_on:
- api
- db
environment:
- API_URL=http://api:3000
networks:
- frontend
- backend
api:
build: ./api
ports:
- "3000:3000"
depends_on:
- db
environment:
- DATABASE_URL=postgres://user:pass@db:5432/app
volumes:
- ./api/src:/app/src # Dev: live reload
networks:
- backend
db:
image: postgres:15-alpine
volumes:
- pgdata:/var/lib/postgresql/data
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=app
networks:
- backend
volumes:
pgdata:
networks:
frontend:
backend:
Compose Commands
# Start all services
docker compose up
# Start in background
docker compose up -d
# Build and start
docker compose up --build
# Stop services
docker compose down
# Stop and remove volumes
docker compose down -v
# View logs
docker compose logs
docker compose logs -f api # Follow specific service
# Scale a service
docker compose up -d --scale api=3
# Execute command in running service
docker compose exec api sh
# Run one-off command
docker compose run --rm api npm test
# View running services
docker compose ps
# Rebuild specific service
docker compose build api
Compose Networking
# Services on same network can reach each other by service name
services:
api:
networks:
- backend
db:
networks:
- backend
# api can reach db at hostname 'db'
# No need to expose db port to host
Environment Variables
services:
api:
# Direct values
environment:
- NODE_ENV=production
- DEBUG=false
# From .env file
env_file:
- .env
- .env.local
# From host environment
environment:
- API_KEY # Takes value from host's $API_KEY
Health Checks in Compose
services:
api:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
db:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d app"]
interval: 10s
timeout: 5s
retries: 5
Dependency Management
services:
api:
depends_on:
db:
condition: service_healthy # Wait for health check
redis:
condition: service_started # Just wait for start
Chapter 2: Production-Grade Docker
Multi-Stage Builds
Multi-stage builds create minimal production images by separating build-time dependencies from runtime.
The Problem with Single-Stage Builds
# BAD: Single stage - image is huge!
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# This image contains:
# - Node.js runtime
# - npm and all dev dependencies (node_modules)
# - Source code
# - Build tools
# Result: 1+ GB image
Multi-Stage Solution
# GOOD: Multi-stage build
# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 2: Production
FROM node:18-alpine AS production
WORKDIR /app
# Only copy what we need
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
# Create non-root user
RUN addgroup -g 1001 -S appgroup && \
adduser -u 1001 -S appuser -G appgroup
USER appuser
EXPOSE 3000
CMD ["node", "dist/index.js"]
# Result: ~150MB image (vs 1GB+)
Multi-Stage for Compiled Languages
# Go application - even smaller final image
# Stage 1: Build
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .
# Stage 2: Minimal runtime
FROM alpine:3.18 AS production
# Add CA certificates for HTTPS
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
# Run as non-root
RUN adduser -D -g '' appuser
USER appuser
EXPOSE 8080
CMD ["./main"]
# Result: ~15MB image!
Python Multi-Stage Build
# Python with virtual environment
# Stage 1: Build dependencies
FROM python:3.11-slim AS builder
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Stage 2: Production
FROM python:3.11-slim AS production
# Install only runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
libpq5 \
&& rm -rf /var/lib/apt/lists/*
# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
WORKDIR /app
COPY . .
# Create non-root user
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser
EXPOSE 8000
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:app"]
Image Optimization
Base Image Selection
| Base Image | Size | Use Case |
|---|---|---|
ubuntu:22.04 |
~77MB | When you need apt and full tooling |
debian:bookworm-slim |
~74MB | Smaller Debian, most packages available |
alpine:3.18 |
~7MB | Minimal, uses musl libc (some compatibility issues) |
distroless/base |
~20MB | No shell, minimal attack surface |
scratch |
0MB | For statically compiled binaries only |
# Alpine-based images are smallest
FROM python:3.11-alpine # ~50MB vs ~150MB for slim
# But watch for compatibility issues with musl libc
# Some Python packages need compilation fixes
Layer Optimization
# BAD: Many layers, poor caching
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
RUN apt-get clean
# GOOD: Single layer, cleanup in same layer
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
.dockerignore
# .dockerignore - exclude from build context
# Version control
.git
.gitignore
# Dependencies (will be installed in container)
node_modules
vendor
__pycache__
*.pyc
venv/
.venv/
# Build outputs
dist
build
*.egg-info
# IDE and editor files
.idea
.vscode
*.swp
*.swo
# Test and docs
tests
test
*.md
docs
coverage
.coverage
# CI/CD
.github
.gitlab-ci.yml
Jenkinsfile
# Docker files (not needed in image)
Dockerfile*
docker-compose*
.docker
# Environment files (security!)
.env
.env.*
*.pem
*.key
# Logs
*.log
logs
Caching Best Practices
# Order matters! Put things that change least at top
# 1. Base image (changes rarely)
FROM python:3.11-slim
# 2. System dependencies (change occasionally)
RUN apt-get update && apt-get install -y --no-install-recommends \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# 3. Application dependencies (change sometimes)
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 4. Application code (changes frequently)
COPY . .
# This way, code changes don't invalidate dependency cache
Security Hardening
1. Run as Non-Root User
# Create user and group
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
# Or on Alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Set ownership
COPY --chown=appuser:appgroup . /app
# Switch to non-root user
USER appuser
# Alternative: Use numeric UID (more portable)
USER 1000:1000
2. Use Read-Only Filesystem
# Run container with read-only root filesystem
docker run --read-only myapp
# If app needs to write, use tmpfs for specific directories
docker run --read-only \
--tmpfs /tmp \
--tmpfs /app/cache \
myapp
3. Drop Capabilities
# Drop all capabilities, add only what's needed
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp
# Common capabilities to drop:
# - CAP_NET_RAW (prevent packet spoofing)
# - CAP_SYS_ADMIN (prevent container escapes)
# - CAP_SETUID/CAP_SETGID (prevent privilege escalation)
4. No New Privileges
# Prevent privilege escalation via setuid binaries
docker run --security-opt=no-new-privileges myapp
5. Scan Images for Vulnerabilities
# Docker Scout (built into Docker Desktop)
docker scout quickview myimage:latest
docker scout cves myimage:latest
# Trivy (open source)
trivy image myimage:latest
# Snyk
snyk container test myimage:latest
# Grype
grype myimage:latest
6. Use Specific Image Tags
# BAD: Using 'latest' - unpredictable
FROM python:latest
# BETTER: Use specific version
FROM python:3.11-slim
# BEST: Use digest for reproducibility
FROM python:3.11-slim@sha256:abc123...
7. Don’t Store Secrets in Images
# BAD: Secrets in image layers
ENV API_KEY=secret123
COPY credentials.json /app/
# GOOD: Use runtime environment variables
# or Docker secrets/Kubernetes secrets
# Build-time secrets (Docker BuildKit)
# syntax=docker/dockerfile:1.4
RUN --mount=type=secret,id=api_key \
API_KEY=$(cat /run/secrets/api_key) ./configure
8. Minimal Attack Surface
# Remove unnecessary tools
RUN apt-get remove --purge -y \
curl \
wget \
&& apt-get autoremove -y
# Use distroless for minimal surface
FROM gcr.io/distroless/base-debian11
# No shell, no package manager - just your app
Security Checklist
# Security best practices checklist:
✓ Non-root user (USER instruction)
✓ Specific base image tags (not :latest)
✓ Multi-stage builds (minimize attack surface)
✓ No secrets in images (use env vars or secrets management)
✓ Image vulnerability scanning in CI/CD
✓ Read-only filesystem where possible
✓ Drop unnecessary capabilities
✓ Resource limits (--memory, --cpus)
✓ No privileged mode (--privileged=false)
✓ Network segmentation (custom networks)
Health Checks
Health checks tell Docker whether your container is actually working, not just running.
Dockerfile Health Check
# HTTP health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# For containers without curl
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1
# Using Python
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')" || exit 1
# Database health check
HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=5 \
CMD pg_isready -U postgres || exit 1
# Redis health check
HEALTHCHECK --interval=10s --timeout=3s --start-period=5s --retries=3 \
CMD redis-cli ping || exit 1
Health Check Parameters
| Parameter | Description | Default |
|---|---|---|
--interval |
Time between checks | 30s |
--timeout |
Max time for check to complete | 30s |
--start-period |
Grace period before checks count | 0s |
--retries |
Failures needed to mark unhealthy | 3 |
Check Health Status
# View health status
docker ps
# CONTAINER ID IMAGE STATUS NAMES
# abc123 myapp Up 5 min (healthy) web
# Detailed health info
docker inspect myapp --format '' | jq
# Health check logs
docker inspect myapp --format '' | jq
Resource Limits
Without limits, a single container can consume all host resources.
Memory Limits
# Hard memory limit (OOM killed if exceeded)
docker run --memory=512m myapp
# Memory + swap limit
docker run --memory=512m --memory-swap=1g myapp
# Soft limit (reservation)
docker run --memory=512m --memory-reservation=256m myapp
# Disable OOM killer (container pauses instead of dying)
docker run --memory=512m --oom-kill-disable myapp
CPU Limits
# Limit to 1.5 CPUs
docker run --cpus=1.5 myapp
# CPU shares (relative weight, default 1024)
docker run --cpu-shares=512 myapp # Half priority
# Pin to specific CPUs
docker run --cpuset-cpus="0,1" myapp # Use CPU 0 and 1
# CPU quota (microseconds per 100ms period)
docker run --cpu-quota=50000 myapp # 50% of one CPU
Docker Compose Resources
services:
api:
deploy:
resources:
limits:
cpus: '0.5'
memory: 512M
reservations:
cpus: '0.25'
memory: 256M
Monitoring Resource Usage
# Real-time stats
docker stats
# Stats for specific containers
docker stats api db redis
# One-time snapshot
docker stats --no-stream
# Format output
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Logging Best Practices
Log to stdout/stderr
# Application logs should go to stdout/stderr
# Docker captures these automatically
# Example: Configure app to log to stdout
CMD ["python", "-u", "app.py"] # -u for unbuffered output
# Symlink log files to stdout/stderr
RUN ln -sf /dev/stdout /var/log/app/access.log \
&& ln -sf /dev/stderr /var/log/app/error.log
Logging Drivers
# View container logs
docker logs myapp
docker logs -f myapp # Follow
docker logs --tail 100 myapp # Last 100 lines
docker logs --since 1h myapp # Last hour
# Use different logging driver
docker run --log-driver=json-file \
--log-opt max-size=10m \
--log-opt max-file=3 \
myapp
# Available drivers:
# - json-file (default)
# - syslog
# - journald
# - fluentd
# - awslogs
# - gcplogs
Structured Logging
# Use JSON logging for easier parsing
import logging
import json
class JsonFormatter(logging.Formatter):
def format(self, record):
log_obj = {
"timestamp": self.formatTime(record),
"level": record.levelname,
"message": record.getMessage(),
"module": record.module,
}
return json.dumps(log_obj)
# Output: {"timestamp": "2024-01-15 10:30:00", "level": "INFO", "message": "Request processed", "module": "api"}
Production Troubleshooting
Issue 1: Container Exits Immediately
Symptoms:
docker run myapp
# Container starts and exits immediately
docker ps -a
# STATUS: Exited (1) 2 seconds ago
Diagnosis:
# Check exit code
docker inspect myapp --format '{{.State.ExitCode}}'
# Check logs
docker logs myapp
# Common exit codes:
# 0 - Normal exit
# 1 - General error
# 137 - SIGKILL (OOM or docker kill)
# 139 - SIGSEGV (segmentation fault)
# 143 - SIGTERM (docker stop)
Common Causes:
- Process runs in background - Container exits when main process ends
```dockerfile
BAD: nginx runs in background, container exits
CMD [“nginx”]
GOOD: Keep nginx in foreground
CMD [“nginx”, “-g”, “daemon off;”]
2. **Command fails**
```bash
# Debug by running shell
docker run -it myapp /bin/sh
# Then manually run your command
- Missing environment variables
docker run -e REQUIRED_VAR=value myapp
Issue 2: OOM Killed (Exit Code 137)
Symptoms:
docker inspect myapp --format '{{.State.OOMKilled}}'
# true
Diagnosis:
# Check memory usage before kill
docker stats myapp --no-stream
# Check container memory limit
docker inspect myapp --format '{{.HostConfig.Memory}}'
# Check host memory
free -h
Solutions:
# Increase memory limit
docker run --memory=1g myapp
# Or fix the memory leak in your application
# Add profiling to find the leak
Issue 3: Cannot Connect to Container
Symptoms:
curl http://localhost:8080
# Connection refused
Diagnosis:
# 1. Is container running?
docker ps
# 2. Is port mapping correct?
docker port myapp
# 8080/tcp -> 0.0.0.0:8080
# 3. Is app listening on correct interface?
docker exec myapp netstat -tlnp
# App must listen on 0.0.0.0, not 127.0.0.1
# 4. Check container logs for errors
docker logs myapp
# 5. Test from inside container
docker exec myapp curl localhost:8080
Common Causes:
# BAD: Only listens on localhost (inside container)
app.run(host='127.0.0.1', port=8080)
# GOOD: Listens on all interfaces
app.run(host='0.0.0.0', port=8080)
Issue 4: Slow Container Startup
Symptoms:
# Container takes minutes to become healthy
docker ps
# STATUS: Up 2 minutes (health: starting)
Diagnosis:
# Check what's happening during startup
docker logs -f myapp
# Check health check timing
docker inspect myapp --format '' | jq
Solutions:
# Increase start period for slow apps
HEALTHCHECK --start-period=60s --interval=30s \
CMD curl -f http://localhost:8080/health || exit 1
# Optimize startup:
# - Lazy load dependencies
# - Defer non-critical initialization
# - Use connection pooling (don't wait for DB on startup)
Issue 5: “No Space Left on Device”
Symptoms:
docker build -t myapp .
# Error: write /var/lib/docker/...: no space left on device
Diagnosis:
# Check Docker disk usage
docker system df
# Detailed breakdown
docker system df -v
# Check host disk
df -h
Solutions:
# Remove unused containers
docker container prune
# Remove unused images
docker image prune
docker image prune -a # Remove all unused (not just dangling)
# Remove unused volumes
docker volume prune
# Remove unused networks
docker network prune
# Nuclear option - remove everything unused
docker system prune -a --volumes
# Clean build cache
docker builder prune
Issue 6: DNS Resolution Failing
Symptoms:
docker exec myapp ping google.com
# ping: bad address 'google.com'
Diagnosis:
# Check DNS configuration
docker exec myapp cat /etc/resolv.conf
# Test with IP (bypass DNS)
docker exec myapp ping 8.8.8.8
# Check Docker daemon DNS settings
docker info | grep -i dns
Solutions:
# Specify DNS servers
docker run --dns 8.8.8.8 --dns 8.8.4.4 myapp
# Or configure in daemon.json
# /etc/docker/daemon.json
{
"dns": ["8.8.8.8", "8.8.4.4"]
}
Issue 7: Permission Denied on Volume
Symptoms:
docker run -v mydata:/data myapp
# Error: Permission denied: /data/file.txt
Diagnosis:
# Check file ownership inside container
docker exec myapp ls -la /data
# Check what user container runs as
docker exec myapp id
# uid=1000(appuser) gid=1000(appuser)
# Check volume ownership on host
sudo ls -la /var/lib/docker/volumes/mydata/_data
Solutions:
# Option 1: Change ownership in Dockerfile
RUN chown -R appuser:appuser /data
# Option 2: Run as root (not recommended for production)
docker run --user root -v mydata:/data myapp
# Option 3: Use init container to fix permissions
docker run --rm -v mydata:/data alpine chown -R 1000:1000 /data
# Option 4: Set permissions in entrypoint
# entrypoint.sh
#!/bin/sh
chown -R appuser:appuser /data
exec gosu appuser "$@"
Issue 8: Container Cannot Reach Other Containers
Symptoms:
docker exec api curl http://db:5432
# curl: (6) Could not resolve host: db
Diagnosis:
# Check if containers are on same network
docker network inspect bridge
# List container networks
docker inspect api --format '' | jq
docker inspect db --format '' | jq
Solutions:
# Default bridge doesn't support DNS resolution
# Use custom network instead
docker network create app-network
docker run -d --name db --network app-network postgres
docker run -d --name api --network app-network myapi
# Now api can reach db by name
docker exec api ping db # Works!
Production Docker Compose Example
Complete production-ready docker-compose setup:
# docker-compose.yml
version: '3.8'
services:
# Application
api:
build:
context: ./api
dockerfile: Dockerfile
target: production
image: mycompany/api:${VERSION:-latest}
restart: unless-stopped
ports:
- "8080:8080"
environment:
- NODE_ENV=production
- DATABASE_URL=postgres://user:${DB_PASSWORD}@db:5432/app
- REDIS_URL=redis://redis:6379
env_file:
- .env
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
deploy:
resources:
limits:
cpus: '1'
memory: 512M
reservations:
cpus: '0.5'
memory: 256M
networks:
- frontend
- backend
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
# Database
db:
image: postgres:15-alpine
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=${DB_PASSWORD}
- POSTGRES_DB=app
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d app"]
interval: 10s
timeout: 5s
retries: 5
deploy:
resources:
limits:
cpus: '0.5'
memory: 256M
networks:
- backend
# Cache
redis:
image: redis:7-alpine
restart: unless-stopped
command: redis-server --appendonly yes --maxmemory 128mb --maxmemory-policy allkeys-lru
volumes:
- redisdata:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '0.25'
memory: 128M
networks:
- backend
# Reverse proxy
nginx:
image: nginx:alpine
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
depends_on:
- api
healthcheck:
test: ["CMD", "nginx", "-t"]
interval: 30s
timeout: 10s
retries: 3
networks:
- frontend
volumes:
pgdata:
redisdata:
networks:
frontend:
backend:
internal: true # No external access to backend network
Quick Reference Commands
# === IMAGES ===
docker build -t name:tag .
docker images
docker rmi image:tag
docker image prune -a
# === CONTAINERS ===
docker run -d --name app -p 8080:80 image
docker ps -a
docker stop/start/restart app
docker rm app
docker logs -f app
docker exec -it app /bin/sh
# === INSPECT & DEBUG ===
docker inspect app
docker stats
docker top app
docker diff app
# === NETWORKS ===
docker network ls
docker network create mynet
docker network connect mynet app
docker network inspect mynet
# === VOLUMES ===
docker volume ls
docker volume create mydata
docker volume inspect mydata
docker volume prune
# === CLEANUP ===
docker system df
docker system prune -a --volumes
docker builder prune
# === COMPOSE ===
docker compose up -d
docker compose down -v
docker compose logs -f
docker compose exec app sh
docker compose ps
Best Practices Summary
Building Images
- Use multi-stage builds
- Choose minimal base images (Alpine, distroless)
- Order Dockerfile for optimal caching
- Use .dockerignore
- Tag with specific versions, not
latest
Security
- Run as non-root user
- Scan images for vulnerabilities
- Don’t store secrets in images
- Use read-only filesystems where possible
- Drop unnecessary capabilities
Runtime
- Set resource limits (CPU, memory)
- Use health checks
- Log to stdout/stderr
- Use custom networks for service discovery
- Use volumes for persistent data
Operations
- Automate image builds in CI/CD
- Implement image retention policies
- Monitor container metrics
- Have a cleanup strategy for unused resources