Sysctl Kernel Parameters for Ubuntu 26.04 Container VMs

2026-05-25

Bottom line

Ubuntu defaults are tuned for general-purpose hosts, not for VMs that run dense container workloads. Based on eight sources including Kubernetes official docs, kernel documentation. Four independent technical guides, a focused set of sysctl and PAM changes measurably improves container networking reliability, file descriptor capacity, conntrack headroom, PID space, and memory behavior. The highest-impact additions to a baseline config are vm.max_map_count, net.netfilter.nf_conntrack_max, net.ipv4.ip_local_port_range, vm.swappiness, kernel.pid_max, and net.ipv4.tcp_fin_timeout. These recommendations are source-backed and avoid unnecessary or risky tuning.

Key findings

vm.max_map_count (262144): Default 65530 is too low for Elasticsearch, databases, and JVM/Go mmap-heavy workloads in containers. Every container-tuning source flags this. (Sources: Binadit, Kawin, kernel docs)
net.netfilter.nf_conntrack_max (1048576): The most common production failure in K8s clusters under load. Container NAT fills the default ~65K conntrack table silently; when exhausted, new connections are dropped with nf_conntrack: table full. (Sources: Goel Academy, Kawin, SumGuy)
net.ipv4.ip_local_port_range (1024 65535): Widens ephemeral port range to prevent port exhaustion on high-outbound-connection container hosts. The default ~28K port range is insufficient. (Sources: Goel Academy, Binadit, Kawin, SumGuy)
vm.swappiness (10): Default 60 swaps too eagerly. Near-universal server recommendation to favor keeping application pages in RAM. (Sources: all eight)
kernel.pid_max (4194304): Default 32768 is too low for dense container hosts. Host tracks container PIDs even with PID namespaces. (Sources: Binadit, Kawin)
TCP keepalive tuning: Default 2-hour idle timeout means dead connections live far too long. Reducing tcp_keepalive_time to 600 with faster probes detects dead connections in ~30s instead of ~11min. (Sources: Goel Academy, Binadit, Kawin, K8s safe-sysctl list)
net.ipv6.conf.default.* siblings: Setting only all.* leaves future interfaces at distribution defaults. Setting default.* ensures inherited policy. (Source: Kawin, Scaleway docs)

Background

Linux distributions ship with conservative kernel defaults designed for general-purpose desktop and server workloads circa the early 2000s. The sysctl interface (/proc/sys/) exposes tunable kernel parameters that can be adjusted at runtime without a reboot. Persistent settings live in /etc/sysctl.d/*.conf.

Container workloads - Docker, Podman, containerd, Kubernetes - stress the kernel in ways that desktop defaults don't anticipate: high connection counts through NAT, many short-lived socket pairs, dense file descriptor usage, per-process mmap limits, bridge firewall traversal, and connection tracking table pressure.

Ubuntu 26.04 (Noble Numbat successor) inherits the same conservative Linux kernel defaults. For VMs that are treated as disposable container workers, host-level kernel tuning is the right place to set policy rather than relying on per-container or per-application overrides.

Current state (as of May 2026)

The research drew on eight fetched sources from 2024–2026:

Primary: Kubernetes official sysctl documentation (September 2024), Linux kernel /proc/sys/vm/ documentation (ongoing)
Technical guides: Binadit container sysctl tutorial (April 2026), Goel Academy SRE tuning guide (August 2025)
Independent configs: Kawin's Pages general/PVE/K8s sysctl configurations, Overcast Blog K8s kernel tuning guide (March 2024)
Community/educational: SumGuy's Ramblings production config, GoLinuxCloud high-performance server guide

The Kubernetes project categorizes sysctls as safe (isolated per-pod) or unsafe (node-level). Safe sysctls can be set per-pod; node-level sysctls must be set by the host OS. Our role targets host-level baseline configuration.

Recommended sysctl configuration

Already covered in a baseline container-host role

These sysctls form the minimum viable container-host config and are confirmed by all sources:

# Bridge firewall - container CNI requires bridged traffic visible to host iptables
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1

# File descriptors - host-wide pool and per-process ceiling
fs.file-max = 2097152
fs.nr_open = 2097152

# Inotify - build tools, reloaders, volume watchers
fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 512

# Panic recovery - cattle VMs auto-reboot
kernel.panic = 10
kernel.panic_on_oops = 1

# Observability - per-task delay accounting for noisy-neighbor diagnostics
kernel.task_delayacct = 1

# Network backlog - burst absorption for container ingress
net.core.netdev_max_backlog = 65536
net.core.somaxconn = 65536

# Forwarding - container bridges and gateways
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1

# Rootless service ports - bind low ports without capabilities
net.ipv4.ip_unprivileged_port_start = 0

# SYN backlog - proxy/edge burst absorption
net.ipv4.tcp_max_syn_backlog = 65536

# IPv6 SLAAC with forwarding
net.ipv6.conf.all.accept_ra = 2

# Memory - always overcommit for container workloads that reserve optimistically
vm.overcommit_memory = 1
vm.panic_on_oom = 0

Additions recommended by research

All eight sources support these additions. Values are source-consensus recommendations.

Memory management

# Raise mmap area limit for Elasticsearch, databases, JVM apps (default 65530)
vm.max_map_count = 262144

# Prefer keeping application pages in RAM (default 60)
vm.swappiness = 10

Connection tracking

# Prevent silent connection drops under container NAT load (default ~65536)
net.netfilter.nf_conntrack_max = 1048576

Ephemeral ports

# Widen outbound port range to prevent exhaustion (default 32768-60999)
net.ipv4.ip_local_port_range = 1024 65535

Connection lifecycle

# Free FIN-WAIT-2 sockets faster on high-churn services (default 60)
net.ipv4.tcp_fin_timeout = 15

# Safe TIME_WAIT reuse in modern kernels (default 0)
net.ipv4.tcp_tw_reuse = 1

TCP keepalive

# Detect dead connections in ~30s instead of ~11min (defaults: 7200/75/9)
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 5

Process limits

# Raise PID ceiling for dense container hosts (default 32768)
kernel.pid_max = 4194304

IPv6 default inheritance

# Ensure future interfaces inherit container-forwarding policy
net.ipv6.conf.default.forwarding = 1
net.ipv6.conf.default.accept_ra = 2

DDoS resistance

# SYN cookies - explicit enable (modern kernels auto-enable on overflow)
net.ipv4.tcp_syncookies = 1

PAM limits

# /etc/security/limits.d/99-container.conf
*  soft  nofile  65536
*  hard  nofile  65536
*  soft  nproc   65536
*  hard  nproc   65536

Note: fs.file-max is the system-wide kernel ceiling; PAM limits are the per-session/per-user ceiling. Both must be raised. Systemd services also, need LimitNOFILE= in their unit files.

Modules

# Load at boot via /etc/modules-load.d/
overlay
br_netfilter

Bridge netfilter (br_netfilter) is required for net.bridge.bridge-nf-call-* sysctls to take effect. Without it, container CNI firewall rules are silently ignored.

Transparent hugepages

Set THP to madvise via systemd oneshot service to avoid latency spikes from kernel-initiated THP compaction while still allowing applications to opt in:

echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

What was excluded and why

These commonly recommended sysctls were intentionally excluded from the baseline:

vm.dirty_ratio / vm.dirty_background_ratio: Changes writeback behavior. Ubuntu 26.04 cloud VMs are often on unpredictable block storage. Leave at distribution default; tune per-workload.
BBR congestion control (tcp_congestion_control = bbr + default_qdisc = fq): Changes global TCP behavior and requires modprobe tcp_bbr. Better as an explicit opt-in per workload profile.
tcp_fastopen = 3: Requires application support; some middleboxes silently drop TFO packets.
rp_filter: Default on Ubuntu cloud images is often 0 (loose). Setting to 1 (strict) can break multi-NIC or asymmetric routing setups.
kernel.kptr_restrict, kernel.dmesg_restrict, kernel.yama.ptrace_scope: Security hardening that belongs in a dedicated security section, not kernel runtime tuning.
vm.min_free_kbytes: Worth tuning but requires scaling with host RAM. Better as a conditional or documented variable.
ARP gc_thresh tuning: Useful on dense bare-metal hosts but less critical on cloud VMs with limited neighbor tables.

Limitations and critiques

Source consensus, not benchmarks: All eight sources are guides and documentation, not peer-reviewed performance studies. Values are community consensus, not workload-benchmarked.
Conntrack memory cost: nf_conntrack_max = 1048576 uses ~300 bytes per entry, or ~300MB RAM at full table. This is acceptable on a container VM but must be accounted for.
tcp_tw_reuse caveat: Safe in modern kernels (4.X+) for client-side connections. On server-side with NAT, less relevant but still harmless.
Ubuntu 26.04 specifics: No Ubuntu 26.04-specific server guide or release notes were found during research (the distro is very new as of May 2026). Recommendations are based on kernel versions expected in 26.04 (6.11+), which are well-tested in prior Ubuntu releases.
ip_local_port_range starting at 1024: This includes the well-known port range (0–1023) in the ephemeral pool. The kernel won't allocate ports already in use, but some administrators prefer starting at 1025 or higher for conceptual separation.

Practical takeaways

Start with the baseline config, then add the research-backed additions. Test each group (memory, network, connection tracking) separately.
Monitor conntrack usage: conntrack -C vs conntrack -S for drops. A rising drop count means nf_conntrack_max must increase.
Monitor file descriptor usage: cat /proc/sys/fs/file-nr returns <allocated> <free> <max>. If allocated approaches max, increase fs.file-max.
Monitor PID usage: cat /proc/sys/kernel/pid_max and compare to ps aux | wc -l. Dense container hosts benefit from raising this early.
All sysctl changes in /etc/sysctl.d/ take effect at boot. Apply immediately with sysctl --system without reboot.
Use the highest-numbered drop-in file (e.G., 99-ubuntu-2604.conf) to ensure your settings override distribution defaults.

Sources used

Using sysctls in a Kubernetes Cluster - Kubernetes official documentation, September 2024
Documentation for /proc/sys/vm/ - Linux kernel documentation, ongoing
Configure Linux Kernel Parameters for Containers - Binadit Tutorials, April 2026
Linux Kernel Parameters (sysctl) Every SRE Should Tune - Goel Academy Blog, August 2025
Linux Sysctl Tuning - Kawin's Pages, 2025+
Kernel Tuning and Optimization for Kubernetes: A Guide - Overcast Blog, March 2024
Sysctl Tuning: The Linux Kernel Settings Nobody Told You About - SumGuy's Ramblings
Sysctl Configuration for High Performance Servers - GoLinuxCloud