"Linux Gazette...making Linux just a little more fun!"


GNU/Linux Benchmarking - Practical Aspects

by

v0.4, 26 November 1997


This is the second article in a series of 4 articles on GNU/Linux Benchmarking, to be published by the Linux Gazette. The first article presented some basic benchmarking concepts and analyzed the Whetstone benchmark in more detail. The present article deals with practical issues in GNU/Linux benchmarking: what benchmarks already exist, where to find them, what they effectively measure and how to run them. And if you are not happy with the available benchmarks, some guidelines to write your own. Also, an application benchmark (Linux kernel 2.0.0 compilation) is analyzed in detail.


1. The DOs and DON'Ts of GNU/Linux benchmarking

2. A roundup of benchmarks for Linux

3. Devising or writing a new Linux benchmark

4. An application benchmark: Linux 2.0.0 kernel compilation with gcc

5. Next month


1. The DOs and DON'Ts of GNU/Linux benchmarking

GNU/Linux is a great OS in terms of performance, and we can hope it will only get better over time. But that is a very vague statement: we need figures to prove it. What information can benchmarks effectively provide us with? What aspects of microcomputer performance can we measure under GNU/Linux?

Kurt Fitzner reminded me of an old saying: "When performance is measured, performance increases."

Let's list some general benchmarking rules (not necessarily in order of decreasing priority) that should be followed to obtain accurate and meaningful benchmarking data, resulting in real GNU/Linux performance gains:

  1. Use GPLed source code for the benchmarks, preferably easily available on the Net.
  2. Use standard tools. Avoid benchmarking tools that have been optimized for a specific system/equipment/architecture.
  3. Use Linux/Unix/Posix benchmarks. Mac, DOS and Windows benchmarks will not help much.
  4. Don't quote your results to three decimal figures. A resolution of 0.1% is more than adequate. Precision of 1% is more than enough.
  5. Report your results in standard format/metric/units/report forms.
  6. Completely describe the configuration being tested.
  7. Don't include irrelevant data.
  8. If variance in results is significant, report alongside results; try to explain why this is so.
  9. Comparative benchmarking is more informative. When doing comparative benchmarking, modify a single test variable at a time. Report results for each combination.
  10. Decide beforehand what characteristic of a system you want to benchmark. Use the right tools to measure this characteristic.
  11. Check your results. Repeat each benchmark once or twice before publicly reporting your results.
  12. Don't set out to benchmark trying to prove that equipment A is better than equipment B; you may be in for a surprise...
  13. Avoid benchmarking one-of-a-kind or proprietary equipment. This may be very interesting for experimental purposes, but the information resulting from such benchmarks is absolutely useless to other Linux users.
  14. Share any meaningful information you may have come up with. If there is a lesson to be learned from the Linux style of development, it's that sharing information is paramount.


2. A roundup of benchmarks for Linux

These are some benchmarks I have collected over the Net. A few are Linux-specific, others are portable across a wide range of Unix-compatible systems, and some are even more generic.

All the benchmarks listed above are available by ftp or http from the Linux Benchmarking Project server in the download directory: www.tux.org/pub/bench or from the Links page.


3. Devising or writing a new Linux benchmark

We have seen last month that (nearly) all benchmarks are based on either of two simple algorithms, or combinations/variations of these:

  1. Measuring the number of iterations of a given task executed over a fixed, predetermined time interval.
  2. Measuring the time needed for the execution of a fixed, predetermined number of iterations of a given task.

We also saw that the Whetstone benchmark would use a combination of these two procedures to "calibrate" itself for optimum resolution, effectively providing a workaround for the low resolution timer available on PC type machines.

Note that some newer benchmarks use new, exotic algorithms to estimate system performance, e.g. the Hint benchmark. I'll get back to Hint in a future article.

Right now, let's see what algorithm 2 would look like:

initialize loop_count
start_time = time()
repeat
benchmark_kernel()
decrement loop_count
until loop_count = 0
duration = time() - start_time
report_results()

Here, time() is a system library call which returns, for example, the elapsed wall-clock time since the last system boot. Benchmark_kernel() is obviously exercising the system feature or characteristic we are trying to measure.

Even this trivial benchmarking algorithm makes some basic assumptions about the system being tested and will report totally erroneous results if some precautions are not taken:

  1. If the benchmark kernel executes so quickly that the looping instructions take a significant percentage of total loop processor clock cycles to execute, results will be skewed. Preferably, benchmark_kernel() should have a duration of > 100 x duration of looping instructions.
  2. Depending on system hardware, one will have to adjust loop_count so that total length duration > 100 x clock resolution (for 1% bechmark precision) or 1000 x clock resolution (for 0.1% benchmark precision). On PC hardware, clock resolution is 10 ms.
  3. We mentionned above that we used a straightforward wall-clock time() function. If the system load is high and our benchmark gets only 3% of the CPU time, we will get completely erroneous results! And of course on a multi-user, pre-emptive, multi-tasking OS like GNU/Linux, it's impossible to guarantee exclusive use of the CPU by our benchmark.

You can substitute the benchmark "kernel" with whatever computing task interests you more or comes closer to your specific benchmarking needs.

Examples of such kernels would be:

For good examples of actual C source code, see the UnixBench and Whetstone benchmark sources.


4. An application benchmark: Linux 2.0.0 kernel compilation with gcc

The more one gets to use and know GNU/Linux, and the more often one compiles the Linux kernel. Very quickly it becomes a habit: as soon as a new kernel version comes out, we download the tar.gz source file and recompile it a few times, fine-tuning the new features.

This is the main reason for proposing kernel compilation as an application benchmark: it is a very common task for all GNU/Linux users. Note that the application that is being directly tested is not the Linux kernel itself, it's gcc. I guess most GNU/Linux users use gcc everyday.

The Linux kernel is being used here as a (large) standard data set. Since this is a large program (gcc) with a wide variety of instructions, processing a large data set (the Linux kernel) with a wide variety of data structures, we assume it will exercise a good subset of OS functions like file I/O, swapping, etc and a good subset of the hardware too: CPU, memory, caches, hard disk, hard disk controller/driver combination, PCI or ISA I/O bus. Obviously this is not a test for X server performance, even if you launch the compilation from an xterm window! And the FPU is not exercised either (but we already tested our FPU with Whetstone, didn't we?). Now, I have noticed that test results are almost independent of hard disk performance, at least on the various systems I had available. The real bottleneck for this test is CPU/cache performance.

Why specify the Linux kernel version 2.0.0 as our standard data set? Because it is widely available, as most GNU/Linux users have an old CD-ROM distribution with the Linux kernel 2.0.0 source, and also because it in quite near in terms of size and structure to present-day kernels. So it's not exactly an out-of-anybody's-hat data set: it's a typical real-world data set.

Why not let users compile any Linux 2.x kernel and report results? Because then we wouldn't be able to compare results anymore. Aha you say, but what about the different gcc and libc versions in the various systems being tested? Answer: they are part of your GNU/Linux system and so also get their performance measured by this benchmark, and this is exactly the behaviour we want from an application benchmark. Of course, gcc and libc versions must be reported, just like CPU type, hard disk, total RAM, etc (see the Linux Benchmarking Toolkit Report Form).

4.1 General benchmark features

Basically what goes on during a gcc kernel compilation (make zImage) is that:

  1. Gcc is loaded in memory,
  2. Gcc gets fed sequentially the various Linux kernel pieces that make up the kernel, and finally
  3. The linker is called to create the zImage file (a compressed image file of the Linux kernel).

Step 2 is where most of the time is spent.

This test is quite stable between different runs. It is also relatively insensitive to small loads (e.g. it can be run in an xterm window) and completes in less than 15 minutes on most recent machines.

4.2 Benchmarking procedure

Getting the source.

Do I really have to tell you where to get the kernel 2.0.0 source? OK, then: ftp://sunsite.unc.edu/pub/Linux/kernel/source/2.0.x or any of its mirrors, or any recent GNU/Linux CD-ROM set with a copy of sunsite.unc.edu. Download the 2.0.0 kernel, gunzip and untar under a test directory (tar zxvf linux-2.0.tar.gz will do the trick).

Compiling and running

Cd to the linux directory you just created and type make config. Press <Enter> to answer all questions with their default value. Now type make dep ; make clean ; sync ; time make zImage. Depending on your machine, you can go and have lunch or just an expresso. You can't (yet) blink and be done with it, even on a 600 MHz Alpha. By the way, if you are going to run this test on an Alpha, you will have to cross-compile the kernel targetting the i386 architecture so that your results are comparable to the more ubiquitous x86 machines.

4.3 Examining the results

Example 1

This is what I get on my test GNU/Linux box:

186.90user 19.30system 3:40.75elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k

0inputs+0outputs (147838major+170260minor)pagefaults 0swaps

The most important figure here is the total elapsed time: 3 min 41 s (there is no need to report fractions of seconds).

Hardware setup description

If you were to complain that the above benchmark is useless without a description of the machine being tested, you'd be 100% correct! So, here is the LBT Report Form for this machine:

LINUX BENCHMARKING TOOLKIT REPORT FORM

CPU 
=== 
Vendor: AMD
Model: K6-200
Core clock:208 MHz (2.5 x 83MHz)
Motherboard vendor: ASUS
Mbd. model: P55T2P4
Mbd. chipset: Intel HX
Bus type: PCI
Bus clock: 41.5 MHz
Cache total: 512 Kb
Cache type/speed: Pipeline burst 6 ns
SMP (number of processors): 1
RAM 
=== 
Total: 32 MB
Type: EDO SIMMs
Speed: 60 ns
Disk 
==== 
Vendor: IBM
Model: IBM-DCAA-34430
Size: 4.3 GB
Interface: EIDE
Driver/Settings: Bus Master DMA mode 2
Video board 
=========== 
Vendor: Generic S3
Model: Trio64-V2
Bus: PCI
Video RAM type: 60 ns EDO DRAM 
Video RAM total: 2 MB
X server vendor: XFree86
X server version: 3.3
X server chipset choice: S3 accelerated 
Resolution/vert. refresh rate: 1152x864 @ 70 Hz
Color depth: 16 bits
Kernel 
====== 
Version: 2.0.29
Swap size: 64 MB
gcc 
=== 
Version: 2.7.2.1
Options: -O2
libc version: 5.4.23 
Test notes 
==========
Very light system load.
RESULTS 
======== 
Linux kernel 2.0.0 Compilation Time: 3 m 41 s
Whetstone Double Precision (FPU) INDEX: N/A
UnixBench 4.10 system INDEX: N/A
Xengine: N/A
BYTEmark integer INDEX: N/A
BYTEmark memory INDEX: N/A
Comments
========= 
Just tested kernel 2.0.0 compilation.

General comments

Again, you will want to compare your results to those obtained on different machines/configurations. You will find some results on my Web site about 6x86s/Linux, in the November News page.

This of course is pure GNU/Linux benchmarking, unless you want to go ahead and try to cross compile the Linux kernel on a Windows95 box!? ;-)


5. Next month

I expect that by next month you will have downloaded and tested a few benchmarks, or even started writing your own. So, in the next article: Collecting and Interpreting Linux Benchmarking Data


Copyright © 1997, André D. Balsa
Published in Issue 23 of the Linux Gazette, December 1997


[ TABLE OF CONTENTS ] [ FRONT PAGE ]  Back  Next