LMBENCH(8) LMBENCH LMBENCH(8)NAMElmbench - system benchmarks
DESCRIPTIONlmbench is a series of micro benchmarks intended to measure basic oper‐
ating system and hardware system metrics. The benchmarks fall into
three general classes: bandwidth, latency, and ``other''.
Most of the lmbench benchmarks use a standard timing harness described
in timing(3) and have a few standard options: parallelism, warmup, and
repetitions. Parallelism specifies the number of benchmark processes
to run in parallel. This is primarily useful when measuring the per‐
formance of SMP or distributed computers and can be used to evaluate
the system's performance scalability. Warmup is the number of minimum
number of microseconds the benchmark should execute the benchmarked
capability before it begins measuring performance. Again this is pri‐
marily useful for SMP or distributed systems and it is intended to give
the process scheduler time to "settle" and migrate processes to other
processors. By measuring performance over various warmup periods,
users may evaulate the scheduler's responsiveness. Repetitions is the
number of measurements that the benchmark should take. This allows
lmbench to provide greater or lesser statistical strength to the
results it reports. The default number of repetitions is 11.
BANDWIDTH MEASUREMENTS
Data movement is fundemental to the performance on most computer sys‐
tems. The bandwidth measurements are intended to show how the system
can move data. The results of the bandwidth metrics can be compared
but care must be taken to understand what it is that is being compared.
The bandwidth benchmarks can be reduced to two main components: operat‐
ing system overhead and memory speeds. The bandwidth benchmarks report
their results as megabytes moved per second but please note that the
data moved is not necessarily the same as the memory bandwidth used to
move the data. Consult the individual man pages for more information.
Each of the bandwidth benchmarks is listed below with a brief overview
of the intent of the benchmark.
bw_file_rd reading and summing of a file via the read(2) interface.
bw_mem_cp memory copy.
bw_mem_rd memory reading and summing.
bw_mem_wr memory writing.
bw_mmap_rd reading and summing of a file via the memory mapping
mmap(2) interface.
bw_pipe reading of data via a pipe.
bw_tcp reading of data via a TCP/IP socket.
bw_unix reading data from a UNIX socket.
LATENCY MEASUREMENTS
Control messages are also fundemental to the performance on most com‐
puter systems. The latency measurements are intended to show how fast
a system can be told to do some operation. The results of the latency
metrics can be compared to each other for the most part. In particu‐
lar, the pipe, rpc, tcp, and udp transactions are all identical bench‐
marks carried out over different system abstractions.
Latency numbers here should mostly be in microseconds per operation.
lat_connect the time it takes to establish a TCP/IP connection.
lat_ctx context switching; the number and size of processes is
varied.
lat_fcntl fcntl file locking.
lat_fifo ``hot potato'' transaction through a UNIX FIFO.
lat_fs creating and deleting small files.
lat_pagefault the time it takes to fault in a page from a file.
lat_mem_rd memory read latency (accurate to the ~2-5 nanosecond
range, reported in nanoseconds).
lat_mmap time to set up a memory mapping.
lat_ops basic processor operations, such as integer XOR, ADD,
SUB, MUL, DIV, and MOD, and float ADD, MUL, DIV, and dou‐
ble ADD, MUL, DIV.
lat_pipe ``hot potato'' transaction through a Unix pipe.
lat_proc process creation times (various sorts).
lat_rpc ``hot potato'' transaction through Sun RPC over UDP or
TCP.
lat_select select latency
lat_sig signal installation and catch latencies. Also protection
fault signal latency.
lat_syscall non trivial entry into the system.
lat_tcp ``hot potato'' transaction through TCP.
lat_udp ``hot potato'' transaction through UDP.
lat_unix ``hot potato'' transaction through UNIX sockets.
lat_unix_connect
the time it takes to establish a UNIX socket connection.
OTHER MEASUREMENTS
mhz processor cycle time
tlb TLB size and TLB miss latency
line cache line size (in bytes)
cache cache statistics, such as line size, cache sizes, memory
parallelism.
stream John McCalpin's stream benchmark
par_mem memory subsystem parallelism. How many requests can the
memory subsystem service in parallel, which may depend on
the location of the data in the memory hierarchy.
par_ops basic processor operation parallelism.
SEE ALSObargraph(1), graph(1), lmbench(3), results(3), timing(3),
bw_file_rd(8), bw_mem_cp(8), bw_mem_wr(8), bw_mmap_rd(8), bw_pipe(8),
bw_tcp(8), bw_unix(8), lat_connect(8), lat_ctx(8), lat_fcntl(8),
lat_fifo(8), lat_fs(8), lat_http(8), lat_mem_rd(8), lat_mmap(8),
lat_ops(8), lat_pagefault(8), lat_pipe(8), lat_proc(8), lat_rpc(8),
lat_select(8), lat_sig(8), lat_syscall(8), lat_tcp(8), lat_udp(8),
lmdd(8), par_ops(8), par_mem(8), mhz(8), tlb(8), line(8), cache(8),
stream(8)ACKNOWLEDGEMENT
Funding for the development of these tools was provided by Sun
Microsystems Computer Corporation.
A large number of people have contributed to the testing and develop‐
ment of lmbench.
COPYING
The benchmarking code is distributed under the GPL with additional
restrictions, see the COPYING file.
AUTHOR
Carl Staelin and Larry McVoy
Comments, suggestions, and bug reports are always welcome.
(c)1994-2000 Larry McVoy and Carl St$Date$ LMBENCH(8)