%% BSDCon Europe 2002 Presentation %deffont "standard" tfont "standard.ttf", tmfont "kochi-mincho.ttf" %deffont "thick" tfont "thick.ttf", tmfont "goth.ttf" %deffont "typewriter" tfont "typewriter.ttf", tmfont "goth.ttf" %% %% Default settings per each line numbers. %% %default 1 area 90 90, leftfill, size 2, fore "white", back "black", font "thick" %default 2 size 7, vgap 30, prefix " ", fore "blue" %default 3 size 2, bar "gray90", vgap 180 %default 4 size 4, fore "gray", vgap 30, font "standard" %% %% Default settings that are applied to TAB-indented lines. %% %tab 1 size 4, vgap 40, prefix " ", icon box "blue" 50 %tab 2 size 4, vgap 40, prefix " ", icon arc "skyblue" 50 %tab 3 size 4, vgap 40, prefix " ", icon delta3 "blue" 40 %% PAGE 1 (TITLE) %page %charset "iso8859-1" %nodefault %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" %bar "skyblue" 6 6 88 %center, fore "Blue", font "standard", hgap 10, size 5.5 Running and tuning of OpenBSD network servers in a production environment %size 2 %bar "skyblue" 6 6 88 %size 4, fore "gray80" Philipp Bühler Henning Brauer %% PAGE 2 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Overview Motivation Resource Exhaustion I/O Exhaustion Disk, NIC, IRQ CPU Exhaustion Memory Exhaustion VM, KVM Resource Allocation mbuf, pool Tools Countermeasures Reallife %% PAGE 3 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Motivation Lack of comprehensive documentation Memory usage of the networking code Hints into the source code Deeper understanding of userland<->kernel %% PAGE 4 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Resource Exhaustions Typical reasons for resource exhaustions Low Budget Peaks ("/.'ed", special occasions) (D)DoS Different resources can suffer from such a situation %% PAGE 5 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" I/O Exhaustion: Disk, NIC Typical countermeasure for slow servers more CPU more RAM Doesn't help for typical I/O exhaustion Disk CPU could run process, but process has not enough data Disk processing needs CPU power itself (e.g. IDE) not enough filesystem-cache NIC "dumb" cards (generate too many interupts) line noise handling (PHY) -> retransmits not designed for high packet rates -> Gbit %% PAGE 6 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" I/O Exhaustion: IRQ IRQ processing is "expensive" every IRQ needs a context switch (csw) every csw needs significant CPU time Countermeasures PCI supports interrupt sharing polling instead of interrupt handling %% PAGE 7 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" CPU Exhaustion CPU usage can be reduced, since a CPU upgrade is not always possible CGI mod_perl mod_php / Zope non-interpreter language RDBMS indexing connection behaviour (persistence) offloading SSL crypto accelerators %% PAGE 8 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Memory Exhaustion: VM Virtual Memory (VM) is RAM + possible swap spaces used as process area for their data structures exhaustion of RAM leads to swap general slow down (disk vs RAM) competing disk I/O complete exhaustion panic slow allocator poor overall performance Monitoring is always a good idea swap as indicator %% PAGE 9 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Memory Exhaustion: KVM Kernel Virtual Memory (KVM) 768MB (on i386) reserved from 4GB address space used for managing hardware stores kernel data syscalls filesystem-cache network data (mbuf) managing VM segmented into fixed-size "maps" (locking) kernel_map, kmem_map mb_map, exec_map, pager_map usually not object to swap (wired pages) limited by RAM exhaustion of KVM panic %% PAGE 10 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Resource Allocation: KVM KVM is needed for virtually anything KVM protected against direct access from/to userland indirectly via syscalls socket(2) send(2) recv(2) It's important that KVM is not "wasted", thus network data has to move fast from kernel to "wire" fast processing by the NIC from kernel to userland freeing buffers of incoming data %% PAGE 11 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Resource Allocation: mbuf(9) Network data is stored in memory buffers (mbuf) fixed-size (MSIZE, usually 256 bytes) overhead by headers (20-40 bytes) chained to queues *mh_next: next mbuf of this chain *mh_nextpkt: next chain in queue historically allocated via malloc(9) allocated from kmem_map (interrupt safe) /usr/include/sys/mbuf.h Bigger chunks of data go to "clusters" fixed-size (MCLBYTES, usually 2048 bytes) referenced by an mbuf (m_ext{}) %% PAGE 12 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Resource Allocation: pool(9) Used in OpenBSD since 3.0, uses constructed objects (fixed-size) faster then malloc by caching objects cache coloring (offsets) less fragmentation watermarks, callbacks not necessarily in kmem_map supports different backend allocators VM can reclaim memory (down to Maxpg) kernel_map for vnodes, inodes, ..; freeing space in kmem_map \ for network data %% PAGE 13 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Tools: top, ps top(1) for overview of memory usage, process stati, interrupt usage # top 'von hand' einblenden, irgendwie mag das mit %system net wirklich ps(1) for more details # same for ps %% PAGE 14 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Tools: vmstat, netstat, systat vmstat procs memory page disks faults cpu vmstat -m (memory/pool usage) Size Pgreq / Pgrel Npage Maxpg vmstat -i / -s (interrupt / swap) netstat -f inet systat vmstat 1 %% PAGE 15 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Tools: symon, pftop symon written by Willem Dijkstra long-time monitoring stores data in rrdtool format data collector: 'symon' central data storage via 'symux' Webinterface: 'symon-web' %% PAGE 16 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Tools: pftop, kvmspy pftop written by Can E. Acar curses-based like netstat, for pf(4) kmvspy written by Daniel Lucq example for how to monitor data uses kvm(3) routines sysctl(3) planned %% PAGE 17 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Countermeasures KVM tuning is possible by three kernel options Can be set while compile-time or via config(8) NMBCLUSTERS/nmbclust maximum number of mbuf clusters NKMEMPAGES/nkmempg total size of kmem_map (populated by mbufs) if unspecified, a sane value is calculated MAX_KMAPENT maximum number of static entries in kmem_map should only be raised, if a special panic occurs (usually by \ high fragmentation) %size 5 Rule of thumb: Think twice and do NOT touch, if in doubt! %% PAGE 18 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Reallife: chat4free.de The site consists of static pages and public forums. The unusual problem here is the both the overall load and the enormous peaks which \ happen when numbers of users are disconnected from the chat server due \ to external network problems or crashes of the server itself. Unlike many web applications, this server has a huge volume of small packets, \ which demonstrates that loading is more an issue of users and packet counts than raw data transfer. %% PAGE 19 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Reallife: Firewall at BSWS The firewall that protects a number of the servers at BSWS is under \ rather heavy load, not really due to total bandwidth, but the large number \ of small packets involved. It is running on a 700MHz Duron with 128M RAM \ and three DEC/Intel 21143-based NICs (one is currently not in use). \ It boots from a small IDE hard disk, which is quite unimportant to \ this application. %% PAGE 20 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Conclusions In the end: Think twice! Leave stuff untouched, if in doubt Calculate, do not just change cause someone said so (howto's..) Monitor and identify the "real" bottleneck, tune there FIRST Use the given tools and your BRAIN and.. DO NOT PANIC ;-) %% PAGE 21 %page %bgrad 45 45 256 45 1 "gray" "black" "black" "black" "black" "black" "black" "gray" Acknowledgments Big "Thank you!" to: Nick Holland for correcting our crappy english and a lot of input \ how to explain things better Artur Grabowski for implementing pool(9) and explaining KVM Also thanks to all the proof-readers, especially Daniel Lucq (KVMspy) Thanks also to Torsten Blum for providing download space for the paper on %size 5 http://guests.vmunix.org/pb/ And, of course, to the OpenBSD developer team!