XFS on RHEL6 for Oracle - solving issue with direct I/O

Blog article:

Recently we were refreshing our recovery system infrastructure, by moving automatic recoveries to new servers, with big bunch of disks directly connected to each of them. Everything went fine until we started to run recoveries - they were much slower than before, even though they were running on more powerful hardware. We started investigation and found some misconfigurations, but after correcting them, performance gain was still too small.

Finally, I found the real problem - we were not using direct I/O, which caused especially recovery phase to be very slow. To give you some background - on our production (and old recovery servers) we are using NAS as a storage backend - with Direct NFS enabled. As MOS note Initialization Parameter 'filesystemio_options' Setting With Direct NFS (dNFS) (Doc ID 1479539.1) confirms, this means that direct and async I/O are always enabled, no matter what value filesystemio_options parameter is set to. Our recovery system uses template initialization files, where mentioned parameter has not been specified at all, that's why it gets default value which on Linux is none. Of course ;) we've forgotten about this "little" fact while moving to new servers where we are not using NAS, but directly attached storage. Since we needed big filesystem to provide space for our recoveries, we've chosen XFS to create 88 TB filesystem with RAID10 underneath.

To make things more complicated, XFS for Oracle database seems to be not supported on RHEL6, not even tested: Oracle Database - Filesystem & I/O Type Supportability on Oracle Linux 6 (Doc ID 1601759.1). According to another MOS note: Certification Information for Oracle Database on Linux x86-64 (Doc ID 1304727.1), it is certified on RHEL7, but we would need to wait at least few weeks until it could be introduced into our infrastructure. Of course we don't need certification to use it only for recovery servers, unless it works... The problem was that after enabling direct and async I/O by setting filesystemio_options to setall, during archived log restore phase we were getting such errors:

$ rman @test.rcv

Recovery Manager: Release 11.2.0.4.0 - Production on Wed Sep 9 15:57:58 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target *
2> run {
3> allocate channel d0 device type disk;
4> restore archivelog sequence 88182 thread 2;
5> }
6> delete noprompt archivelog sequence 88182 thread 2;
7> 
connected to target database: DBNAME (DBID=41348668, not open)

using target database control file instead of recovery catalog
allocated channel: d0
channel d0: SID=331 device type=DISK

Starting restore at 09-SEP-2015 15:58:00

channel d0: starting archived log restore to default destination
channel d0: restoring archived log
archived log thread=2 sequence=88182
channel d0: reading from backup piece /backup/dbs02/DBNAME/DBNAME_20150906_97qebtd5_1_1_arch
channel d0: ORA-19870: error while restoring backup piece /backup/dbs02/DBNAME/DBNAME_20150906_97qebtd5_1_1_arch
ORA-19504: failed to create file "/ORA/dbs03/oradata/DBNAME/arch/DBNAME-t2-726673761-88182.arch"
ORA-27044: unable to write the header block of file
Linux-x86_64 Error: 22: Invalid argument
Additional information: 3

failover to previous backup

released channel: d0
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 09/09/2015 15:58:03
RMAN-06026: some targets not found - aborting restore
RMAN-06025: no backup of archived log for thread 2 with sequence 88182 and starting SCN of 6722894093907 found to restore

Recovery Manager complete.

During investigation I've tried many things, like for example mounting XFS filesystem through loopback NFS or moving log_archive_dest out of XFS filesystem, but with no luck - if I remember correctly it was also failing on datafile switching to copy phase, so not even reaching archived logs restore. Fortunately, I've found excellent series of blog posts written by Chris Buckel regarding problems with Oracle's support of 4k devices (please have a look there as it seems this is the most exhaustive one pointing also to the others on the same topic). I've started to think that it could be related also with our case, especially that we're still using 512 bytes block sizes for our redo. So let's check details of our configuration:

$ fdisk -l /dev/mapper/datavg-dbs03

Disk /dev/mapper/datavg-dbs03: 96015.6 GB, 96015604514816 bytes
255 heads, 63 sectors/track, 11673232 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 12582912 bytes
Disk identifier: 0x00000000

$ xfs_info /ORA/dbs03
meta-data=/dev/mapper/datavg-dbs03 isize=256    agcount=88, agsize=268435328 blks
         =                       sectsz=4096  attr=2, projid32bit=0
data     =                       bsize=4096   blocks=23441309696, imaxpct=1
         =                       sunit=128    swidth=3072 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

After reading above output, I've started to see light in the tunnel. And guess what happened after recreating XFS with 512 bytes block and section sizes...it worked:

$ mkfs.xfs -f -b size=512 -s size=512 /dev/mapper/datavg-dbs03

$ xfs_info /ORA/dbs03
meta-data=/dev/mapper/datavg-dbs03 isize=256    agcount=88, agsize=2147482624 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=512    blocks=187530477568, imaxpct=1
         =                       sunit=1024   swidth=24576 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=512    blocks=1048576, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


$ rman @test.rcv
Recovery Manager: Release 11.2.0.4.0 - Production on Wed Sep 9 16:36:06 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target *
2> run {
3> allocate channel d0 device type disk;
4> restore archivelog sequence 88182 thread 2;
5> }
6> delete noprompt archivelog sequence 88182 thread 2;
7> 
connected to target database: DBNAME (DBID=41348668, not open)

using target database control file instead of recovery catalog
allocated channel: d0
channel d0: SID=331 device type=DISK

Starting restore at 09-SEP-2015 16:36:08

channel d0: starting archived log restore to default destination
channel d0: restoring archived log
archived log thread=2 sequence=88182
channel d0: reading from backup piece /backup/dbs02/DBNAME/DBNAME_20150906_97qebtd5_1_1_arch
channel d0: piece handle=/backup/dbs02/DBNAME/DBNAME_20150906_97qebtd5_1_1_arch tag=DBNAME_RAC_20150906T195010_A
channel d0: restored backup piece 1
channel d0: restore complete, elapsed time: 00:00:03
Finished restore at 09-SEP-2015 16:36:12
released channel: d0

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=331 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=346 device type=DISK
deleted archived log
archived log file name=/ORA/dbs03/oradata/DBNAME/arch/DBNAME-t2-726673761-88182.arch RECID=159049 STAMP=889979770
Deleted 1 objects

Recovery Manager complete.

Of course it not only succeeded, but was also much faster - datafile restore phase about 20%, but recovery phase (in our test case no incremental backups, just about 1.47 TB of archived logs to be applied) was... 5x times faster. By recovery phase I mean restoring archived logs and applying them sequentially, which means that real redo apply rate was even much boosted by enabling direct and async I/O.

Tags

XFS