Recently we were refreshing our recovery system infrastructure, by moving automatic recoveries to new servers, with big bunch of disks directly connected to each of them. Everything went fine until we started to run recoveries - they were much slower than before, even though they were running on more powerful hardware. We started investigation and found some misconfigurations, but after correcting them, performance gain was still too small.
Finally, I found the real problem - we were not using direct I/O, which caused especially recovery phase to be very slow. To give you some background - on our production (and old recovery servers) we are using NAS as a storage backend - with Direct NFS enabled. As MOS note Initialization Parameter 'filesystemio_options' Setting With Direct NFS (dNFS) (Doc ID 1479539.1) confirms, this means that direct and async I/O are always enabled, no matter what value filesystemio_options parameter is set to. Our recovery system uses template initialization files, where mentioned parameter has not been specified at all, that's why it gets default value which on Linux is none. Of course ;) we've forgotten about this "little" fact while moving to new servers where we are not using NAS, but directly attached storage. Since we needed big filesystem to provide space for our recoveries, we've chosen XFS to create 88 TB filesystem with RAID10 underneath.
To make things more complicated, XFS for Oracle database seems to be not supported on RHEL6, not even tested: Oracle Database - Filesystem & I/O Type Supportability on Oracle Linux 6 (Doc ID 1601759.1). According to another MOS note: Certification Information for Oracle Database on Linux x86-64 (Doc ID 1304727.1), it is certified on RHEL7, but we would need to wait at least few weeks until it could be introduced into our infrastructure. Of course we don't need certification to use it only for recovery servers, unless it works... The problem was that after enabling direct and async I/O by setting filesystemio_options to setall, during archived log restore phase we were getting such errors:
$ rman @test.rcv
Recovery Manager: Release 11.2.0.4.0 - Production on Wed Sep 9 15:57:58 2015 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. RMAN> connect target * 2> run { 3> allocate channel d0 device type disk; 4> restore archivelog sequence 88182 thread 2; 5> } 6> delete noprompt archivelog sequence 88182 thread 2; 7> connected to target database: DBNAME (DBID=41348668, not open) using target database control file instead of recovery catalog allocated channel: d0 channel d0: SID=331 device type=DISK Starting restore at 09-SEP-2015 15:58:00 channel d0: starting archived log restore to default destination channel d0: restoring archived log archived log thread=2 sequence=88182 channel d0: reading from backup piece /backup/dbs02/DBNAME/DBNAME_20150906_97qebtd5_1_1_arch channel d0: ORA-19870: error while restoring backup piece /backup/dbs02/DBNAME/DBNAME_20150906_97qebtd5_1_1_arch ORA-19504: failed to create file "/ORA/dbs03/oradata/DBNAME/arch/DBNAME-t2-726673761-88182.arch" ORA-27044: unable to write the header block of file Linux-x86_64 Error: 22: Invalid argument Additional information: 3 failover to previous backup released channel: d0 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of restore command at 09/09/2015 15:58:03 RMAN-06026: some targets not found - aborting restore RMAN-06025: no backup of archived log for thread 2 with sequence 88182 and starting SCN of 6722894093907 found to restore Recovery Manager complete.
During investigation I've tried many things, like for example mounting XFS filesystem through loopback NFS or moving log_archive_dest out of XFS filesystem, but with no luck - if I remember correctly it was also failing on datafile switching to copy phase, so not even reaching archived logs restore. Fortunately, I've found excellent series of blog posts written by Chris Buckel regarding problems with Oracle's support of 4k devices (please have a look there as it seems this is the most exhaustive one pointing also to the others on the same topic). I've started to think that it could be related also with our case, especially that we're still using 512 bytes block sizes for our redo. So let's check details of our configuration:
$ fdisk -l /dev/mapper/datavg-dbs03 Disk /dev/mapper/datavg-dbs03: 96015.6 GB, 96015604514816 bytes 255 heads, 63 sectors/track, 11673232 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 524288 bytes / 12582912 bytes Disk identifier: 0x00000000 $ xfs_info /ORA/dbs03 meta-data=/dev/mapper/datavg-dbs03 isize=256 agcount=88, agsize=268435328 blks = sectsz=4096 attr=2, projid32bit=0 data = bsize=4096 blocks=23441309696, imaxpct=1 = sunit=128 swidth=3072 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
After reading above output, I've started to see light in the tunnel. And guess what happened after recreating XFS with 512 bytes block and section sizes...it worked:
$ mkfs.xfs -f -b size=512 -s size=512 /dev/mapper/datavg-dbs03 $ xfs_info /ORA/dbs03 meta-data=/dev/mapper/datavg-dbs03 isize=256 agcount=88, agsize=2147482624 blks = sectsz=512 attr=2, projid32bit=0 data = bsize=512 blocks=187530477568, imaxpct=1 = sunit=1024 swidth=24576 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=512 blocks=1048576, version=2 = sectsz=512 sunit=64 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ rman @test.rcv Recovery Manager: Release 11.2.0.4.0 - Production on Wed Sep 9 16:36:06 2015 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. RMAN> connect target * 2> run { 3> allocate channel d0 device type disk; 4> restore archivelog sequence 88182 thread 2; 5> } 6> delete noprompt archivelog sequence 88182 thread 2; 7> connected to target database: DBNAME (DBID=41348668, not open) using target database control file instead of recovery catalog allocated channel: d0 channel d0: SID=331 device type=DISK Starting restore at 09-SEP-2015 16:36:08 channel d0: starting archived log restore to default destination channel d0: restoring archived log archived log thread=2 sequence=88182 channel d0: reading from backup piece /backup/dbs02/DBNAME/DBNAME_20150906_97qebtd5_1_1_arch channel d0: piece handle=/backup/dbs02/DBNAME/DBNAME_20150906_97qebtd5_1_1_arch tag=DBNAME_RAC_20150906T195010_A channel d0: restored backup piece 1 channel d0: restore complete, elapsed time: 00:00:03 Finished restore at 09-SEP-2015 16:36:12 released channel: d0 allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=331 device type=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: SID=346 device type=DISK deleted archived log archived log file name=/ORA/dbs03/oradata/DBNAME/arch/DBNAME-t2-726673761-88182.arch RECID=159049 STAMP=889979770 Deleted 1 objects Recovery Manager complete.
Of course it not only succeeded, but was also much faster - datafile restore phase about 20%, but recovery phase (in our test case no incremental backups, just about 1.47 TB of archived logs to be applied) was... 5x times faster. By recovery phase I mean restoring archived logs and applying them sequentially, which means that real redo apply rate was even much boosted by enabling direct and async I/O.