IO-Schedulers on Linux

hannes_kuehnemund · ‎10-25-2006

Scheduler on Linux One might argue that schedulers may not be needed for operating system, as they introduce another level of complexity into the planing and understanding how things work. This kind of view is kind of outdated, because without an IO scheduler, the kernel would write down every write request in order it receives it. This would trash the disk or disk system completely. Furthermore, if between two request of writing some data onto the disk the kernel has to read some other data from a complete other part of the disk, the disk head has to seek from one location to another and back to fullfill the read request in between the two write requests. Thus optimizing disk requests is the main purpose of IO schedulers. Further and more technical information can be found in the Linux kernel source in the folder Documentation/block. Basically all text documents in this folder describe the latest or outdated schedulers. Overview of currently available schedulers on Linux: as:

setting antic_expire turns as into deadline scheduler
based on the notion that a disk device has only one physical seeking head
implements 5 layers of scheduler Policies

cfq:

currently cfq version 3 (cfqv3) available but for the test only cfqv2 was available
implements three generic scheduling classes
priority can be set according to process nice level

deadline:

Favors reads over writes
requests have an expire time
reorder requests to improve I/O

noop:

perfect scheduler for e.g. flash memory cards
request are queue in FIFO order
Only last requests added may be merged into one request

The noop scheduler was not testing during the benchmark! Setup of benchmark The software setup which I used to test the three most common schedulers (as, cfq and deadline) is the following. The operating system used is a SLES10 wich runs on top of a Dual Socket 3,6Ghz Xeon EM64T with Hyperthreading enabled. There are three 36GB SCSI disks installed. The partitioning is as follows. Disk sda1 (8GB) is the root filesystem, sda2 (4GB) is swap and the remaining sda3 (24GB), sdb1 (36GB) and sdc1 (36GB) are put into one big LVM volume group. This volume group is used exclusively by one big local volume mounted to /usr/sap. On top of the SLES10 operating system a new SAP system is installed manually. This is done by installing MaxDB database binaries manually, creating SAP Instance folders, extracting SAP kernel binaries, copying the R3load files on the logical volume, create instance profiles manually and before the measurement starts, the creation of six data volumes with each having 15GB and the log volume with 2GB on the logical volume. After all these steps the next one is the import of all R3load export files with R3load. What is measured During the load phase five R3load processes are loading data into the database simulateously. The import is started with the five biggest export files, out of 34 available. When the first R3load is done with its file it proceeds to the next one until all files are loaded to the database. These 34 files contain of more then 44000 tables with are all imported into the database. These tables use approximately 73GB in the database. After the import of every table a MaxDB savepoint is triggered, which writes all data which is currently in the data cache to the data volume. This ensures that there are several concurrent writes on the disk. After all files are processed the initialization reports RSWBOINS and RADDBDIF are executed and the import is done. The time between the start of the import until the end of the import phase plus executing the reports is measured and compared in the end. One might argue that the duration of creating data and log volumes should also be measured, but creating volumes by writing zeros in order on the disk may not be affected by schedules that much. Furthermore the LVM configuration introduces another level of complexity which may have an influence on the results, however I like LVM and the using LVM for benchmarks is not forbidden! Needless to say, that after each run the system was cleaned up and rebooted to have a fresh environment for the next benchmark. Benchmark Results with DBLoad I leave the following numbers uncommented. They may leave space for some discussion therefore no conclusion is made by myself. The first column shows the scheduler used (cfq, as or deadline), the second column shows the seconds it took to load the database.

scheduler	load time
cfq	25908 s
as	26645 s
deadline	28217 s

IO-Schedulers on Linux

Now live: 2014 SAP HANA and SAP HANA Cloud Applications Challenge voting

My Personal Ux, Fiori, Portal and Cloud Cheat Sheet

Web Dynpro ABAP Demonstration Videos