Skip to Content
What about tuning file systems on Linux

My investigations in the past regarding Linux I/O Schedulers showed, that the standard SUSE SLES9/10 setting, which is the cfq scheduler, is the best choice for an SAP database load with MaxDB. Besides the I/O Schedulers there are other layers between the database and the physical disks. The layer which this blog focuses on is the file system which is used. I used the latest SAP certified Linux distribution, which is SLES10, for the tests and of course the cfq scheduler.

Available file systems on SLES10

There are several available file systems on SLES10. ext2, ext3, reiserfs and XFS are just some of them. For this testing I picked two of them. They have huge database files to handle. The first file system which will be tested is ext3. One test will be performed on an ext3 system which was formatted by yast2 without any additional options. The other ext3 test was performed on the same physical devices but this time I formatted the disks using special options. The command was:

mke2fs -j -O dir_index,sparse_super,filetype -T largefile

Additionally the following mount option was added to the fstab entry accordingly:

data=writeback

One might ask, why I did not add the ‘noatime’ option to the entry. The solution is pretty simple, I just forgot it. The second file system, which I use for testing is XFS. Please take into account, that it may be the case, that database vendors do not support XFS for their database. In our case, MaxDB is not that strict, so lets give XFS a try. As I did not find any suitable XFS format or mount options, I let yast2 format the physical devices with XFS.

This leads us to three different setups to test. The questions that now rises is: What can we test?

Software used

The software, which I used for testing is special version of an SAP ERP2005 SR1 Unicode system. The specialty of this system is the installation method. For testing purposes any user interaction which normally is performed during the installation phase is already per-defined. No user interaction is needed, you just kick off the installation and wait for the outcome.

Furthermore, this special installation method implements time measurements for several parts of the database load. A normal database load for MaxDB consists of the phases. The first phase is creating the data volumes, which are only writes on the physical device. The second phase is loading these data volumes with content including index creation. This phase has lots of concurrent reads and writes (reads needed for index creation). During the third and last phase the database statistics are updated. These are only reads and very very tiny writes.

This allows us to test the file systems for write, read/write and read performance. The software is placed on a NFS files which shares the installation packages via NFS. Now, As we know everything about the software used, I give you a small overview over the hardware which is used for the test.

Hardware used

The hardware used for this test is an 4 socket, dual core Intel Xeon (Paxville) machine, having 16GB of installed memory. The storage is connected via an Qlogic FC controller and a 2GB FC switch. The storage has 28x73GB 15k SCSI disks in total. On the storage lots of 36GB devices are created. Six of them are assigned to the test machine. During the testings, the storage was used exclusively to avoid interrupts by other testings.

These six physical devices use all the same file system and mount options for the test. One physical device was used to put all SAP binaries and log files onto it. It’s mounted to /usr/sap. The second device holds the MaxDB binaries and log files. It’s mounted to /opt/sapdb. On the remaining four devices the database volumes are placed. One device holds the log volume, the other three devices hold two data volumes each (we have a total of six data volumes on three devices). As MaxDB configures two writer threads per volume, there is a total of two threads writing to the device where the log volumes resides and 4 threads writing to a devices where a data volumes resides. A total of 14 threads are writing on the storage in the worst case scenario.

Having exactly 14 physical disks used by the six devices which are assigned to the machine a perfect ration between disks and writer threads is given. This configuration will give us enough plain I/O performance to check, which of the file systems performs best.

Outcome of the testings

The first column shows the scheduler used. Different format options are listed in brackets. The second column shows the time in seconds needed to create the 2GB log volume and afterwards 6x15GB data volumes in parallel. The third column shows the time in seconds needed to load the content and create the indexes for all tables. The last column shows you the seconds needed to update the database statistics. The lower all values, the better they are!

Scheduler write only read/write read only
ext3 (tuned) 840 5229 2371
ext3 (std) 794 5126 2294
XFS 760 5091 2260

The most interesting outcome is, that a “tuned” ext3 file system is slower during a MaxDB database load then a yast2 formatted one. It may be the case, that the options meant for tuning weren’t the best. But I haven’t found much information on the Internet about tuning the ext3 file system. Anyway, XFS won the race, but the advantage is not very big. During normal daily work, such workloads may not happen on very database server. Anyway, a new SDN article is on its way covering more file systems, more format and mounting options and nevertheless, sar and iostat data. It will take a while to finish the article, just stay tuned and drop some comments about useful format/mount options for ext3, reiserfs and XFS.

To report this post you need to login first.

3 Comments

You must be Logged on to comment or reply to a post.

  1. Former Member
    I think the dirindex option is not needed for filesystems with just a small number of files, however it wont hurt (that much) eighter.

    The largefile option should only affect the inode/size ration.. hmm.. maybe blocksize?

    (0) 
  2. Former Member
    oh… any maybe the journal was too small… (dont know what the installer used as options, you might want to do dumpe2fs -h on the filesystems)
    (0) 
    1. Former Member
      Post author
      I’ll try to have a look onto this when doing the next tests, as the hardware is used for something else at the moment.

      But thanks for the hint anyway

      (0) 

Leave a Reply