SlideShare a Scribd company logo
2
Most read
3
Most read
4
Most read
Highly Available NFS (HANFS) over ACFS
An Oracle White Paper
May 2013
Highly Available NFS over Oracle ASM
Cluster File System (ACFS)
Highly Available NFS (HANFS) over ACFS
1
Introduction
High Availability NFS (HANFS) provides uninterrupted service of NFS V2/V3
exported paths by exposing NFS exports on Highly Available Virtual IPs (HAVIP)
and using Oracle Clusterware agents to ensure that the HAVIPs and NFS exports
are always online. This VIP will ensure that the export is available to clients, even
if the node currently hosting it goes down.
Benefits of HANFS:
• HANFS is built on top of the ACFS cluster file system allowing NFS files
to leverage advantages of ACFS.
• NFS files exported through HANFS are highly available because they are
supported by the ACFS cluster file system.
• HANFS performance and capacity scales through the dynamic addition of
ASM disk group storage and cluster nodes.
Supported Platforms:
• AIX – AIX v6.1 or later
• Solaris – Solaris 11 GA or later, X64 and Sparc64
• Linux – Red Hat Enterprise Linux v5.3 and later or v6.0 and later
(requires nfs-utils-1.0.9-60 or later)
• Linux – Oracle Enterprise Linux v5.3 and later or v6.0 and later, with the
Unbreakable Enterprise Kernel or the Red Hat Compatible Kernel
Highly Available NFS (HANFS) over ACFS
2
(requires nfs-utils-1.0.9-60 or later)
• Linux – SUSE Enterprise Server v11 or later (requires nfs-kernel-server-
1.2.1-2.24.1 or later)
Since HANFS relies on the base Operating System to provide all NFS related
functionality (such as the NFS server and NFS client utilities) it is highly
recommended that the admin update all NFS related tools to the latest revision
before beginning HANFS implementation.
Oracle ACFS HANFS 12.1 only supports NFS v3 over Ipv4, with no NFS locking.
Resources:
In addition to ACFS/ADVM and ASM, HANFS also relies on new Oracle 12.1
Clusterware (CRS) resources, namely the HAVIP and the ExportFS.
The HAVIP resource is a special class of the standard Oracle node VIP resource.
Each HAVIP resource is responsible for managing a single IP address in the
cluster, on one and only one node at a time. It will move around the cluster as
necessary to provide the client facing interface for the export file system. The
HAVIP requires one or more configured exports in order to successfully run on a
node.
The ExportFS resource is responsible for ensuring that the NFS server operating
system exports one or more designated ACFS file systems over NFS. This
resource requires that the specified ACFS file systems are configured to be
mounted on every server cluster node. If an exported ACFS file system becomes
unavailable on a given cluster node, the ExportFS resource will migrate to
another node in the cluster where the file systems are available and export the
file systems from that node. An ExportFS resource is associated with an HAVIP
resource and together they manage exporting and presenting server cluster
ACFS file systems for access by NFS clients.
HANFS Configuration Rules:
A given set of associated HAVIP and ExportFS resources is managed as a group
by Oracle Clusterware. A simple rule of thumb can help when setting up a highly
available export: The HAVIP resource will execute on a node in the server cluster
where the largest number of ACFS file systems identified with that resource
group are currently mounted and the least number of other HAVIP services are
executing in order to load balance across the cluster.
The following are guidelines for ensuring that HANFS provides the maximum
scalability and availability:
- Exports should be grouped into two categories: those that require
Highly Available NFS (HANFS) over ACFS
3
maximum throughput, and those that require maximum availability.
a. HAVIPs should be configured so that the estimated throughput on
all attached ExportFS resources is roughly similar to the other
HAVIPs.
b. Exports that require maximum availability should be configured to
have their own HAVIP.
Client Usage:
The HANFS cluster service is configured via an HAVIP and associated ExportFS
resources. A client node can issue a mount request referencing the HAVIP and
ExportFS path. While the client node has this file system mounted, applications
executing on the client node can access files from the exported ACFS file
system. During node transition events, the client may see a momentary pause in
the data stream for reads and writes, but will shortly resume operation as if there
was no interruption; no client side interaction is required.
Under what situations will the HAVIP and Exports move to other
nodes?
1. Server cluster membership change events (such as a node leaving or
joining the cluster) will force the HAVIP to reevaluate their distribution,
potentially moving the HAVIP and ExportFS resources to different cluster
nodes. During this time, the client will see a momentary pause in service,
but as soon as the export is reestablished (usually under 3s), the client
will continue to operate as if there was no interruption in services.
2. In the rare event of a storage failure, resulting in a file system that is no
longer accessible on a particular node of the server, the HAVIP will
evaluate if a particular node is able to better provide for all its required file
systems. If so, the HAVIP will move to that node, thus ensuring that the
node with the maximum available file systems is where it is located. This
ensures that client nodes will have access to the most file systems at any
time.
3. When the server cluster admin requests a move. Using the new 12.1
commands, an admin can do a planned relocation, forcing the HAVIP and
its associated ExportFS resources to move to another node of the cluster.
This can be useful for planned node outages.
4. In the case of cluster member specific network connectivity issues, the
cluster member will be removed from the cluster namespace, and the
HAVIP with its associated ExportFS resources will move to a connected
node.
Highly Available NFS (HANFS) over ACFS
4
Command Explanation:
Usage: srvctl add havip -id <id> -address {<name>|<ip>} [-netnum
<network_number>] [-description <text>]
-id <id> unique ID for havip
-address(A) <ip|name> IP address or host name
-netnum(k) <net_num> Network number (default number is 1)
-description "<text>" HAVIP description
-id The ID is a unique identifier generated by the server cluster administrator for
each HAVIP. It is used when assigning an ExportFS to an HAVIP and will be
displayed by other commands.
-address The IP address or host name that this HAVIP will host the exports on.
Allowable values are non-DHCP and non-round robin DNS values.
Hostnames must be resolvable in DNS. No Ipv6 addresses are currently
allowed.
-netnum The oracle network number this HAVIP should attach itself to. This will
define later characteristics of the exports, such as the default sub-net that
they export to.
-description This field allows the admin to set a text description for a particular HAVIP. It
will later be displayed by various status commands.
Usage: srvctl add exportfs -name <expfs_name> -id <id> -path <exportpath> [-
clients <export_clients>] [-options <export_options>]
-name <name> unique name for the export file system
-path <export_path> ACFS file system path to be exported
-id <id> unique ID for havip
-options <export_options> export options for the exportfs file system
-clients <export_clients> export clients(hosts) for the exportfs file system
-name This is a unique identifier generated by the server cluster administrator for
the ExportFS. It will show up in later status commands.
-path The path to be exported. Certain operating systems interpret the NFS spec
differently. Thus, valid paths may be OS dependent. For instance – Solaris
will not allow the export of a sub-directory of an already exported directory.
In this case, the start of the ExportFS will display an error message to the
user. Additional considerations on Solaris include exporting an ACFS
snapshot - the parent of all snapshots will be exported.
-id This is the HAVIP ID that the ExportFS will be attached to.
-options This is the options that will be passed through to the NFS Server. For
instance, setting RW or RO, various security settings, file system ID, and
other OS specific attributes can be placed here. This is also the place to
specify your client lists on Solaris and AIX.
-clients (Linux Only) On Linux, various client specifiers can be placed here: subnet,
IP, hostname, etc. No validation is done of these clients. Also valid is '*',
meaning all clients.
Highly Available NFS (HANFS) over ACFS
5
Note:
The following scenarios are illustrated using Enterprise Linux 5. Similar setup
will take place on all supported OS versions – AIX, Solaris and Linux.
Pre-setup for all scenarios:
HANFS requires NFS in order to run. It is assumed that NFS (and its associated
needs) will be started by init scripts at node boot time. NFS will need to be
running on all nodes of the cluster.
We can check this on each node by using:
bash-3.2# /etc/init.d/portmap status
portmap (pid 1639) is running...
bash-3.2# /etc/init.d/nfs status
rpc.mountd (pid 2228) is running...
nfsd (pid 2225 2224 2223 2222 2221 2220 2219 2218) is running...
rpc.rquotad (pid 2204) is running...
If one of these is not running, you could start it by using:
/etc/init.d/<service> start
The 'chkconfig' command can be used to ensure that these services are started
at boot time:
bash-3.2# /sbin/chkconfig nfs on
bash-3.2# /sbin/chkconfig portmap on
Although SELinux is not supported on ACFS mount points, ensure that any
SELinux setup on the system itself is correctly configured for NFS file systems –
you should be running in a mode that allows NFS access to be allowed.
Generally this is via 'enforcing – targeted' or 'permissive'.
Now that we have our initial setup out of the way, we can configure our file
systems. Remember that HANFS requires an ACFS file system that is
configured to be mounted on all nodes via a single file system resource. There
are several ways to achieve this:
• Using ASMCA:
- Right click the disk group
- Select 'Create ACFS for DB use'
- Follow prompts
• Command line:
- Create a volume device using asmcmd
- Format the volume device using 'mkfs'
Highly Available NFS (HANFS) over ACFS
6
• Use 'srvctl add filesystem -device <device> -path <mount path>' to
register the file system with crs
• Use 'srvctl start filesystem -device <device>' to mount the path
File System Creation Example using srvctl commands
bash-3.2# mkfs -t acfs /dev/asm/hr1-194
mkfs.acfs: version = 12.1.0.0.2
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/hr1-194
mkfs.acfs: volume size = 5368709120
mkfs.acfs: Format complete.
bash-3.2# mkdir /hr1
bash-3.2# srvctl add filesystem -path /hr1 -device /dev/asm/hr1-194
bash-3.2# srvctl start filesystem -device /dev/asm/hr1-194
bash-3.2# mount -t acfs
/dev/asm/ag1-194 on /db1 type acfs (rw)
/dev/asm/dbfiles-194 on /db1/dbfiles type acfs (rw)
/dev/asm/dbfiles2-194 on /db1/dbfiles2 type acfs (rw)
/dev/asm/hr1-194 on /hr1 type acfs (rw)
bash-3.2# srvctl status filesystem -device /dev/asm/hr1-194
ACFS file system /hr1 is mounted on nodes agraves-vm1,agraves-vm2
Recommended Options for Client Mounts (on client):
Use the following client mount options with HANFS. In order for HANFS to work
well, hard mounts should be used.
• hard – this tells the NFS client to continue retrying the operation by
attempting to contact the server. This differs from soft mounting – a soft
mount will return an error as soon as the file system is unavailable. A
hard mount will wait until the file system is available again before making
determinations on the state of the file system.
• intr – when used in conjunction with hard, this allows NFS operations on
the client to be interrupted (such as by ^C). This allows the user to
terminate operations that appear to be hanging.
• nolock – HANFS supports NFSv3. This version of NFS does not
appropriately handle locks and lock recovery on all platforms and in all
scenarios. Thus, for safety sake, it is better to disallow lock operations
rather than having an application think locks are working correctly. This is
specific to the NFS Server.
Configuring a simple Oracle HANFS scenario
Let's assume a simple 2 node cluster configuration. The following file systems
will be exported from this cluster:
Highly Available NFS (HANFS) over ACFS
7
• /hr1 – expected average throughput around 500KB/s
• /hr2 – expected average throughput around 500KB/s
Each file system is hosted in the same ASM disk group, with no failover and
external redundancy. The admin does not expect any availability issues for these
file systems, and since they are in the same disk group, it is likely that a storage
outage will affect both of them equally. The combined throughput is low enough
that there won't likely be any network bandwidth issues if a single node hosted
both exports.
Thus, for simplicity, the admin has chosen to create only a single HAVIP and to
attach both export file systems to this VIP. This gives all clients a single HAVIP
address to access both mount points. This has the downside that adding new
nodes to the Oracle HANFS cluster will not automatically scale the throughput of
the entire cluster.
Registering the HAVIP:
bash-3.2# srvctl add havip -id HR1 -address agraves-clu4.us.oracle.com -netnum 1 -
description "HR specific exports for the Omega Project"
bash-3.2# srvctl status havip -id HR1
HAVIP ora.hr1.havip is enabled
HAVIP ora.hr1.havip is not running
bash-3.2# srvctl start havip -id HR1
PRCR-1079 : Failed to start resource ora.hr1.havip
CRS-2805: Unable to start 'ora.hr1.havip' because it has a 'hard' dependency on
resource type 'ora.HR1.export.type' and no resource of that type can satisfy the
dependency
Why the failure to start? Recall earlier that we mentioned that an HAVIP requires
1 or more ExportFS configured. Without an ExportFS, the HAVIP will not start. If
a client had mounted the ExportFS and the HAVIP started without the ExportFS
available, the client would receive an ESTALE error. This resource dependency
prevents the resumption of NFS services on the client until the server side file
system is available for access.
Registering the ExportFS:
bash-3.2# srvctl add exportfs -path /hr1 -id HR1 -name HR1 -options
"rw,no_root_squash" -clients agraves-vm5,agraves-vm6
bash-3.2# srvctl status exportfs -name HR1
export file system hr1 is enabled
export file system hr1 is not exported
bash-3.2# srvctl add exportfs -path /hr2 -id HR1 -name HR2 -options "ro"
Highly Available NFS (HANFS) over ACFS
8
At this point, starting either the ExportFS or the HAVIP will start all configured
ExportFS on that HAVIP, or will start the associated HAVIP.
We've chosen to export the second ExportFS, HR2, to only the sub-net of the
network resource:
bash-3.2# srvctl config exportfs -name HR2
export file system hr2 is configured
Exported path: /hr2
Export Options: ro
Configured Clients: 10.149.236.0/22
bash-3.2# srvctl config exportfs -name HR1
export file system hr1 is configured
Exported path: /hr1
Export Options: rw,no_root_squash
Configured Clients: agraves-vm5,agraves-vm6
Compare that with HR1, which is only available to 2 clients: agraves-vm5 and
agraves-vm6.
We can see the configured dependencies from the HAVIP to the other resources:
START_DEPENDENCIES=hard(ora.net1.network,uniform:type:ora.HR1.export.type)
attraction(ora.data.hr1.acfs,ora.data.hr2.acfs) dispersion:active(type:ora.havip.type)
pullup(ora.net1.network) pullup:always(type:ora.HR1.export.type)
STOP_DEPENDENCIES=hard(intermediate:ora.net1.network,uniform:intermediate:type:
ora.HR1.export.type)
These dependencies ensure that the HAVIP is started after the ExportFS and
ACFS resources, and is stopped before them.
There are several ways that we could start the exports:
• srvctl start exportfs -id <ID> - will start all exports attached to the HAVIP
with the id <ID>
• srvctl start havip -id <ID> - will start all exports attached to the HAVIP with
the id <ID>
• srvctl start exportfs -name <NAME> - will start just the <NAME> ExportFS
and its HAVIP
bash-3.2# srvctl start exportfs -id HR1
bash-3.2# srvctl status exportfs
export file system hr1 is enabled
export file system hr1 is exported on node agraves-vm2
export file system hr2 is enabled
Highly Available NFS (HANFS) over ACFS
9
export file system hr2 is exported on node agraves-vm2
bash-3.2# /usr/sbin/exportfs -v
/hr1 agraves-
vm5.us.oracle.com(rw,wdelay,no_root_squash,no_subtree_check,fsid=128850576,anonu
id=65534,anongid=65534)
/hr1 agraves-
vm6.us.oracle.com(rw,wdelay,no_root_squash,no_subtree_check,fsid=128850576,anonu
id=65534,anongid=65534)
/hr2
10.149.236.0/22(ro,wdelay,root_squash,no_subtree_check,fsid=1573414370,anonuid=65
534,anongid=65534)
Here we can clearly see that the exports are exported to the proper clients with
the proper options.
Node Relocation:
Now let’s say we want to relocate our exported file system to another node in the
cluster. The command for this is 'srvctl relocate havip':
bash-3.2# srvctl relocate havip -id HR1 -node agraves-vm1
bash-3.2# srvctl status havip -id HR1
HAVIP ora.hr1.havip is enabled
HAVIP ora.hr1.havip is running on node agraves-vm1
Using any of the commands available to determine resource state (crsctl, srvctl),
we can see that they are now running on the new node:
bash-3.2# crsctl stat res -w "TYPE = ora.HR1.export.type" -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.hr1.export
1 ONLINE ONLINE agraves-vm1 STABLE
ora.hr2.export
1 ONLINE ONLINE agraves-vm1 STABLE
--------------------------------------------------------------------------------
The same principle would hold true in the case of a unplanned outage – such as
network failure, storage failure, or complete node failure.
A More Complex Scenario
Oracle HANFS has very flexible features to allow more complex configurations
for maximizing your NFS availability and performance.
Highly Available NFS (HANFS) over ACFS
10
Let’s examine a case where the admin has chosen 6 mount points for HANFS:
 /hr1 – 500Kb/s
 /hr – 10Mb/s – must be always available
 /source – 100Kb/s – must be always available
 /PDF – 10Mb/s
 /Games – 100Mb/s (heavily used file system)
 /World_Domination_Plans – 1Mb/s
Due to high availability requirements, the best configuration for this would be:
• /hr1, /PDF and /World_Domination_Plans configured on one HAVIP address
• /Games on one HAVIP address
• /source on one HAVIP address
• /hr on one HAVIP address
Rationale:
• Placing /Games on its own HAVIP address isolates its intense throughput
from other HAVIPs, allowing CRS to potentially place this HAVIP and its
associated ExportFS on its own server, away from the other HAVIPs.
(Assuming you have enough servers in your cluster.)
• Placing /source on its own HAVIP address allows CRS to move it to a
cluster member that can serve the file system, should there be a storage
connection issue. Since there is only one ExportFS on this HAVIP, CRS
needs only find a node where the ACFS file system is available, with no
policy decision or trade-off necessary.
• Placing /hr on its own HAVIP allows for the same logic to apply to /hr as
applies to /source.
• Placing the rest on their own HAVIP simplifies the number of IP
addresses necessary. In the unlikely event that a different file system is
available on each node of our cluster, CRS will place the HAVIP on the
node that it determines is the best. This could cause 1 file system to be
unavailable.
CRS Policy Illustration – Choosing the Best Node:
Consider the following cluster:
Node 1 – available file systems: /fs1 /fs2 /fs3
Node 2 – available file systems: /fs1 /fs3
Node 3 – available file systems: /fs2 /fs3 /fs4
If we consider a single HAVIP, exporting all 4 file systems, CRS will make a policy
decision as to the best place to export the file systems from.
No node truly satisfies the desired intent of all file systems available from a single
cluster node.
So, CRS will determine that either Node 1 or Node 3 is the best place for our
Highly Available NFS (HANFS) over ACFS
11
HAVIP and associated ExportFS. Either of these choices will result in 1 file
system being unavailable due to storage connection issues.
HAVIP Failover Times:
There are 2 cases to consider when considering node failover:
• Planned relocation
• Node failure
Planned Relocation:
When the admin forcibly moves the HAVIP and associated ExportFS resources
from node to node, the following steps are taken:
1) HAVIP on first node is shutdown.
2) Associated ExportFS resources on first node are shutdown.
3) Associated ExportFS resources are started on second node.
4) HAVIP is started on first node.
When an HAVIP is configured with a large number of associated ExportFS
resources, the time taken for this planned failover may become large. Each
individual ExportFS should take no more than 1 or 2 seconds to shutdown on
Node 1. If there is sufficient processing power that CRS can stop all of them in
parallel, then this may happen quickly. However, if CRS must sequentially stop
each ExportFS, due to CPU processing limitations, the worst case scenario is 2s
* <Number of ExportFS resources>.
The same scenario applies in reverse for startup time. Thus, worst case scenario
TotalTimeToRelocate= (2s * <Number of ExportFS resources>) * 2 + 5s (where
5s is the time to stop and start the havip).
The exported file systems are unavailable until the HAVIP is started, which would
happen at the end of this process.
So, while maintaining a single HAVIP may make management easier, it may
increase the time for file systems to again begin processing and be available for
clients.
Node Failure:
Node failure is similar – although easier. From the time that CRS notices the
node is gone, the only steps that will be taken are the HAVIP startup, which will
happen after all associated ExportFS resources are started. Thus, time taken is
½ the relocation time.
NFS Considerations:
Since HANFS relies on NFS, standard NFS configuration and performance
tuning is applicable to the Oracle RAC HANFS product. Client options such as
rsize and wsize may have a dramatic difference in the speed of data access.
Highly Available NFS (HANFS) over ACFS
12
Additionally, the location of your file servers to your clients will affect NFS
performance – best performance is usually gained by co-locating NFS servers
with your clients.
Oracle RAC HANFS Scalability:
Let's discuss for a second the performance of the Oracle RAC HANFS Cluster.
Assume that the backed network fabric supports 1 Gb/s throughput. If each
cluster member is connected to this network, a theoretical maximum throughput
is equal to 1 Gb/s * # of nodes in Oracle RAC HANFS Cluster. However,
remember that CRS will move the HAVIP around, so if multiple HAVIPs are
hosted on a single node, the maximum throughput is equal to (1 Gb/s * # of
nodes in Oracle RAC HANFS Cluster) / (# of HAVIP on a single node).
We can quickly see some performance benefits to keeping the number of
HAVIPs in a single cluster = # of nodes in the Oracle RAC HANFS Cluster,
assuming that this meets our high availability needs (as discussed earlier). If
each node hosts 2 HAVIPs, then we could double our Oracle RAC HANFS
Cluster's combined throughput by simple doubling the number of nodes in the
cluster.
Consider also the case where a single node is hosting 2 HAVIPs – and the
performance of those HAVIPs exported file systems is 50% expected. Assuming
that the backed storage and network fabric are appropriate for the expected
throughput, we can remedy this situation by simply adding a new node to the
Oracle RAC HANFS cluster. This causes CRS to move the HAVIP in such a
manner that each node now only hosts 1 HAVIP, instead of having 2 on one
node.
ExportFS Resource Behavior:
Alternative methods of starting the resource:
• It is possible to start an export resource via Transparent High Availability,
like many other resources. In this case, the ExportFS resource monitors
whether or not the system reports that the file system is exported. Thus,
it is possible to use the 'exportfs' (Linux) command to bring an export
online. When this is done, the ExportFS resources may end up running
on different nodes. This state is allowable, but not recommended. As
soon as the associated HAVIP is started, the ExportFS resources will
move to the node hosting that resource.
- This method is not recommended for the following reasons: If the
admin manually removes an export via an OS provided command line
tools, such as exportfs, the associated ExportFS resource will also go
offline. The use of a command line tool triggers a transparent action
on the part of the ExportFS resource. Since the OS provided tools
are not integrated with the CRS stack, this relies on the ExportFS
resource to notice the change, which introduces a delay into the
system to reflect the state of the system. Since the CRS resource
Highly Available NFS (HANFS) over ACFS
13
stack may also evaluate other conditions, it is recommended to use
Oracle provided tools, such as srvctl to administer ExportFS
resources.
Modifying Export Options using system tools:
• When the admin uses a system tool to modify the export options (such as
changing rw to ro), the ExportFS resource will reflect this via a 'PARTIAL'
state. This tells the admin that the ExportFS is running, but the options
that it was configured with are different than the currently exported
options on that server.
Stopping the ExportFS using THA:
• If the admin manually removes an export via a command line tool such as
exportfs, the associated ExportFS resource will also go offline.
Removal of exported directory:
• If the directory that is configured to be exported is removed via 'rm', the
associated ExportFS will go offline.
Controlling the Location of Exports in the Cluster:
For the admin that prefers more control over the location of exports, an HAVIP
can be configured to only run on certain nodes by use of the 'disable' and
'enable' commands.
For instance, assume that the admin wanted to only have HR2 and HR1
(exported over HAVIP HR1) running on 1 node of a 2 node cluster. After adding
the resource, the admin could run the following to limit the resource:
bash-3.2# /scratch/crs_home/bin/srvctl disable havip -node agraves-vm2 -id HR1
Further Thoughts:
Oracle HANFS may be used in many different ways with other Oracle ACFS
features. The following is just a few examples:
• Taking an ACFS snapshot of an ACFS file system exported via Oracle RAC
HANFS. This would allow for a backup to be made.
• Exporting an Oracle Home over Oracle RAC HANFS, and ensuring that it
was always available.
• Configuring an Oracle RAC HANFS ExportFS with Oracle ACFS security
Realms so that it was read-only during certain time periods to prevent
unauthorized access.
• Using Oracle RAC HANFS with ASM disk groups configured with more than
the default of 1 failure group to ensure that the underlying storage would be
extremely fault tolerant. This would effectively remove the possibility of
storage failure for a single disk group, while Oracle RAC HANFS would allow
the export itself to be always available, creating a single extremely highly
available file server.
Oracle CloudFS
January 2013 Update
Author: Allan Graves
Contributing Authors: Ara Shakian
Oracle Corporation
World Headquarters
500 Oracle Parkway
Redwood Shores, CA 94065
U.S.A.
Worldwide Inquiries:
Phone: +1.650.506.7000
Fax: +1.650.506.7200
oracle.com
Copyright © 2011, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and
the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any
means, electronic or mechanical, for any purpose, without our prior written permission.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective
owners.
0109

More Related Content

What's hot (20)

PDF
Anatomy of the loadable kernel module (lkm)
Adrian Huang
 
PDF
Xen and the art of embedded virtualization (ELC 2017)
Stefano Stabellini
 
PDF
Kdump and the kernel crash dump analysis
Buland Singh
 
PDF
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Adrian Huang
 
PDF
RDMA on ARM
inside-BigData.com
 
PDF
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
PDF
Memory Mapping Implementation (mmap) in Linux Kernel
Adrian Huang
 
PDF
Qemu Pcie
The Linux Foundation
 
PPTX
Introduction to Kernel and Device Drivers
RajKumar Rampelli
 
KEY
Handling Redis failover with ZooKeeper
ryanlecompte
 
PPTX
Cache memory
Ahsan Ashfaq
 
PPTX
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Community
 
PDF
Embedded Virtualization applied in Mobile Devices
National Cheng Kung University
 
PPT
Container security
Anthony Chow
 
PDF
Xen Hypervisor
Susheel Thakur
 
PDF
Page cache in Linux kernel
Adrian Huang
 
PDF
IBM general parallel file system - introduction
IBM Danmark
 
PDF
Physical Memory Management.pdf
Adrian Huang
 
PDF
Linux Kernel - Virtual File System
Adrian Huang
 
PPTX
COSCUP 2020 RISC-V 32 bit linux highmem porting
Eric Lin
 
Anatomy of the loadable kernel module (lkm)
Adrian Huang
 
Xen and the art of embedded virtualization (ELC 2017)
Stefano Stabellini
 
Kdump and the kernel crash dump analysis
Buland Singh
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Adrian Huang
 
RDMA on ARM
inside-BigData.com
 
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
Memory Mapping Implementation (mmap) in Linux Kernel
Adrian Huang
 
Introduction to Kernel and Device Drivers
RajKumar Rampelli
 
Handling Redis failover with ZooKeeper
ryanlecompte
 
Cache memory
Ahsan Ashfaq
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Community
 
Embedded Virtualization applied in Mobile Devices
National Cheng Kung University
 
Container security
Anthony Chow
 
Xen Hypervisor
Susheel Thakur
 
Page cache in Linux kernel
Adrian Huang
 
IBM general parallel file system - introduction
IBM Danmark
 
Physical Memory Management.pdf
Adrian Huang
 
Linux Kernel - Virtual File System
Adrian Huang
 
COSCUP 2020 RISC-V 32 bit linux highmem porting
Eric Lin
 

Similar to ORACLE HA NFS over Oracle ASM (20)

PPTX
Oracle ACFS High Availability NFS Services (HANFS)
Anju Garg
 
PDF
Oracle ACFS High Availability NFS Services (HANFS) Part-I
Anju Garg
 
PDF
EBS on ACFS white paper
Andrejs Karpovs
 
PDF
E-Business Suite Rapid Provisioning Using Latest Features Of Oracle Database 12c
Andrejs Karpovs
 
PPTX
Andrew File System
Ashish KC
 
PPTX
distributed files in parallel computonglec 7.pptx
tahakhan699813
 
PPTX
Using ACFS as a Storage for EBS
Andrejs Karpovs
 
PPT
Distributed file system
Naza hamed Jan
 
PPT
Nfs
tmavroidis
 
ODP
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Gluster.org
 
PDF
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
NETWAYS
 
PPT
NFS.ppt shshsjsjsjssjsjsksksksksksisisisisi
mukul narayana
 
PDF
Rha cluster suite wppdf
projectmgmt456
 
PDF
Assume you have decided to implement DFS so remote sites can access .pdf
ivylinvaydak64229
 
PDF
Tr 4067 nfs-best practice_and_implementation
narit_ton
 
PDF
file-storage-100.pdf
Abhi850745
 
PPTX
Oracle cloud storage and file system
Andrejs Karpovs
 
ODP
Cl306
Juliette Ponnet
 
PPT
Presentation on nfs,afs,vfs
Prakriti Dubey
 
PPTX
a distributed implementation of the classical time-sharing model of a file sy...
Manonmani40
 
Oracle ACFS High Availability NFS Services (HANFS)
Anju Garg
 
Oracle ACFS High Availability NFS Services (HANFS) Part-I
Anju Garg
 
EBS on ACFS white paper
Andrejs Karpovs
 
E-Business Suite Rapid Provisioning Using Latest Features Of Oracle Database 12c
Andrejs Karpovs
 
Andrew File System
Ashish KC
 
distributed files in parallel computonglec 7.pptx
tahakhan699813
 
Using ACFS as a Storage for EBS
Andrejs Karpovs
 
Distributed file system
Naza hamed Jan
 
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Gluster.org
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
NETWAYS
 
NFS.ppt shshsjsjsjssjsjsksksksksksisisisisi
mukul narayana
 
Rha cluster suite wppdf
projectmgmt456
 
Assume you have decided to implement DFS so remote sites can access .pdf
ivylinvaydak64229
 
Tr 4067 nfs-best practice_and_implementation
narit_ton
 
file-storage-100.pdf
Abhi850745
 
Oracle cloud storage and file system
Andrejs Karpovs
 
Presentation on nfs,afs,vfs
Prakriti Dubey
 
a distributed implementation of the classical time-sharing model of a file sy...
Manonmani40
 
Ad

Recently uploaded (20)

PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Ad

ORACLE HA NFS over Oracle ASM

  • 1. Highly Available NFS (HANFS) over ACFS An Oracle White Paper May 2013 Highly Available NFS over Oracle ASM Cluster File System (ACFS)
  • 2. Highly Available NFS (HANFS) over ACFS 1 Introduction High Availability NFS (HANFS) provides uninterrupted service of NFS V2/V3 exported paths by exposing NFS exports on Highly Available Virtual IPs (HAVIP) and using Oracle Clusterware agents to ensure that the HAVIPs and NFS exports are always online. This VIP will ensure that the export is available to clients, even if the node currently hosting it goes down. Benefits of HANFS: • HANFS is built on top of the ACFS cluster file system allowing NFS files to leverage advantages of ACFS. • NFS files exported through HANFS are highly available because they are supported by the ACFS cluster file system. • HANFS performance and capacity scales through the dynamic addition of ASM disk group storage and cluster nodes. Supported Platforms: • AIX – AIX v6.1 or later • Solaris – Solaris 11 GA or later, X64 and Sparc64 • Linux – Red Hat Enterprise Linux v5.3 and later or v6.0 and later (requires nfs-utils-1.0.9-60 or later) • Linux – Oracle Enterprise Linux v5.3 and later or v6.0 and later, with the Unbreakable Enterprise Kernel or the Red Hat Compatible Kernel
  • 3. Highly Available NFS (HANFS) over ACFS 2 (requires nfs-utils-1.0.9-60 or later) • Linux – SUSE Enterprise Server v11 or later (requires nfs-kernel-server- 1.2.1-2.24.1 or later) Since HANFS relies on the base Operating System to provide all NFS related functionality (such as the NFS server and NFS client utilities) it is highly recommended that the admin update all NFS related tools to the latest revision before beginning HANFS implementation. Oracle ACFS HANFS 12.1 only supports NFS v3 over Ipv4, with no NFS locking. Resources: In addition to ACFS/ADVM and ASM, HANFS also relies on new Oracle 12.1 Clusterware (CRS) resources, namely the HAVIP and the ExportFS. The HAVIP resource is a special class of the standard Oracle node VIP resource. Each HAVIP resource is responsible for managing a single IP address in the cluster, on one and only one node at a time. It will move around the cluster as necessary to provide the client facing interface for the export file system. The HAVIP requires one or more configured exports in order to successfully run on a node. The ExportFS resource is responsible for ensuring that the NFS server operating system exports one or more designated ACFS file systems over NFS. This resource requires that the specified ACFS file systems are configured to be mounted on every server cluster node. If an exported ACFS file system becomes unavailable on a given cluster node, the ExportFS resource will migrate to another node in the cluster where the file systems are available and export the file systems from that node. An ExportFS resource is associated with an HAVIP resource and together they manage exporting and presenting server cluster ACFS file systems for access by NFS clients. HANFS Configuration Rules: A given set of associated HAVIP and ExportFS resources is managed as a group by Oracle Clusterware. A simple rule of thumb can help when setting up a highly available export: The HAVIP resource will execute on a node in the server cluster where the largest number of ACFS file systems identified with that resource group are currently mounted and the least number of other HAVIP services are executing in order to load balance across the cluster. The following are guidelines for ensuring that HANFS provides the maximum scalability and availability: - Exports should be grouped into two categories: those that require
  • 4. Highly Available NFS (HANFS) over ACFS 3 maximum throughput, and those that require maximum availability. a. HAVIPs should be configured so that the estimated throughput on all attached ExportFS resources is roughly similar to the other HAVIPs. b. Exports that require maximum availability should be configured to have their own HAVIP. Client Usage: The HANFS cluster service is configured via an HAVIP and associated ExportFS resources. A client node can issue a mount request referencing the HAVIP and ExportFS path. While the client node has this file system mounted, applications executing on the client node can access files from the exported ACFS file system. During node transition events, the client may see a momentary pause in the data stream for reads and writes, but will shortly resume operation as if there was no interruption; no client side interaction is required. Under what situations will the HAVIP and Exports move to other nodes? 1. Server cluster membership change events (such as a node leaving or joining the cluster) will force the HAVIP to reevaluate their distribution, potentially moving the HAVIP and ExportFS resources to different cluster nodes. During this time, the client will see a momentary pause in service, but as soon as the export is reestablished (usually under 3s), the client will continue to operate as if there was no interruption in services. 2. In the rare event of a storage failure, resulting in a file system that is no longer accessible on a particular node of the server, the HAVIP will evaluate if a particular node is able to better provide for all its required file systems. If so, the HAVIP will move to that node, thus ensuring that the node with the maximum available file systems is where it is located. This ensures that client nodes will have access to the most file systems at any time. 3. When the server cluster admin requests a move. Using the new 12.1 commands, an admin can do a planned relocation, forcing the HAVIP and its associated ExportFS resources to move to another node of the cluster. This can be useful for planned node outages. 4. In the case of cluster member specific network connectivity issues, the cluster member will be removed from the cluster namespace, and the HAVIP with its associated ExportFS resources will move to a connected node.
  • 5. Highly Available NFS (HANFS) over ACFS 4 Command Explanation: Usage: srvctl add havip -id <id> -address {<name>|<ip>} [-netnum <network_number>] [-description <text>] -id <id> unique ID for havip -address(A) <ip|name> IP address or host name -netnum(k) <net_num> Network number (default number is 1) -description "<text>" HAVIP description -id The ID is a unique identifier generated by the server cluster administrator for each HAVIP. It is used when assigning an ExportFS to an HAVIP and will be displayed by other commands. -address The IP address or host name that this HAVIP will host the exports on. Allowable values are non-DHCP and non-round robin DNS values. Hostnames must be resolvable in DNS. No Ipv6 addresses are currently allowed. -netnum The oracle network number this HAVIP should attach itself to. This will define later characteristics of the exports, such as the default sub-net that they export to. -description This field allows the admin to set a text description for a particular HAVIP. It will later be displayed by various status commands. Usage: srvctl add exportfs -name <expfs_name> -id <id> -path <exportpath> [- clients <export_clients>] [-options <export_options>] -name <name> unique name for the export file system -path <export_path> ACFS file system path to be exported -id <id> unique ID for havip -options <export_options> export options for the exportfs file system -clients <export_clients> export clients(hosts) for the exportfs file system -name This is a unique identifier generated by the server cluster administrator for the ExportFS. It will show up in later status commands. -path The path to be exported. Certain operating systems interpret the NFS spec differently. Thus, valid paths may be OS dependent. For instance – Solaris will not allow the export of a sub-directory of an already exported directory. In this case, the start of the ExportFS will display an error message to the user. Additional considerations on Solaris include exporting an ACFS snapshot - the parent of all snapshots will be exported. -id This is the HAVIP ID that the ExportFS will be attached to. -options This is the options that will be passed through to the NFS Server. For instance, setting RW or RO, various security settings, file system ID, and other OS specific attributes can be placed here. This is also the place to specify your client lists on Solaris and AIX. -clients (Linux Only) On Linux, various client specifiers can be placed here: subnet, IP, hostname, etc. No validation is done of these clients. Also valid is '*', meaning all clients.
  • 6. Highly Available NFS (HANFS) over ACFS 5 Note: The following scenarios are illustrated using Enterprise Linux 5. Similar setup will take place on all supported OS versions – AIX, Solaris and Linux. Pre-setup for all scenarios: HANFS requires NFS in order to run. It is assumed that NFS (and its associated needs) will be started by init scripts at node boot time. NFS will need to be running on all nodes of the cluster. We can check this on each node by using: bash-3.2# /etc/init.d/portmap status portmap (pid 1639) is running... bash-3.2# /etc/init.d/nfs status rpc.mountd (pid 2228) is running... nfsd (pid 2225 2224 2223 2222 2221 2220 2219 2218) is running... rpc.rquotad (pid 2204) is running... If one of these is not running, you could start it by using: /etc/init.d/<service> start The 'chkconfig' command can be used to ensure that these services are started at boot time: bash-3.2# /sbin/chkconfig nfs on bash-3.2# /sbin/chkconfig portmap on Although SELinux is not supported on ACFS mount points, ensure that any SELinux setup on the system itself is correctly configured for NFS file systems – you should be running in a mode that allows NFS access to be allowed. Generally this is via 'enforcing – targeted' or 'permissive'. Now that we have our initial setup out of the way, we can configure our file systems. Remember that HANFS requires an ACFS file system that is configured to be mounted on all nodes via a single file system resource. There are several ways to achieve this: • Using ASMCA: - Right click the disk group - Select 'Create ACFS for DB use' - Follow prompts • Command line: - Create a volume device using asmcmd - Format the volume device using 'mkfs'
  • 7. Highly Available NFS (HANFS) over ACFS 6 • Use 'srvctl add filesystem -device <device> -path <mount path>' to register the file system with crs • Use 'srvctl start filesystem -device <device>' to mount the path File System Creation Example using srvctl commands bash-3.2# mkfs -t acfs /dev/asm/hr1-194 mkfs.acfs: version = 12.1.0.0.2 mkfs.acfs: on-disk version = 39.0 mkfs.acfs: volume = /dev/asm/hr1-194 mkfs.acfs: volume size = 5368709120 mkfs.acfs: Format complete. bash-3.2# mkdir /hr1 bash-3.2# srvctl add filesystem -path /hr1 -device /dev/asm/hr1-194 bash-3.2# srvctl start filesystem -device /dev/asm/hr1-194 bash-3.2# mount -t acfs /dev/asm/ag1-194 on /db1 type acfs (rw) /dev/asm/dbfiles-194 on /db1/dbfiles type acfs (rw) /dev/asm/dbfiles2-194 on /db1/dbfiles2 type acfs (rw) /dev/asm/hr1-194 on /hr1 type acfs (rw) bash-3.2# srvctl status filesystem -device /dev/asm/hr1-194 ACFS file system /hr1 is mounted on nodes agraves-vm1,agraves-vm2 Recommended Options for Client Mounts (on client): Use the following client mount options with HANFS. In order for HANFS to work well, hard mounts should be used. • hard – this tells the NFS client to continue retrying the operation by attempting to contact the server. This differs from soft mounting – a soft mount will return an error as soon as the file system is unavailable. A hard mount will wait until the file system is available again before making determinations on the state of the file system. • intr – when used in conjunction with hard, this allows NFS operations on the client to be interrupted (such as by ^C). This allows the user to terminate operations that appear to be hanging. • nolock – HANFS supports NFSv3. This version of NFS does not appropriately handle locks and lock recovery on all platforms and in all scenarios. Thus, for safety sake, it is better to disallow lock operations rather than having an application think locks are working correctly. This is specific to the NFS Server. Configuring a simple Oracle HANFS scenario Let's assume a simple 2 node cluster configuration. The following file systems will be exported from this cluster:
  • 8. Highly Available NFS (HANFS) over ACFS 7 • /hr1 – expected average throughput around 500KB/s • /hr2 – expected average throughput around 500KB/s Each file system is hosted in the same ASM disk group, with no failover and external redundancy. The admin does not expect any availability issues for these file systems, and since they are in the same disk group, it is likely that a storage outage will affect both of them equally. The combined throughput is low enough that there won't likely be any network bandwidth issues if a single node hosted both exports. Thus, for simplicity, the admin has chosen to create only a single HAVIP and to attach both export file systems to this VIP. This gives all clients a single HAVIP address to access both mount points. This has the downside that adding new nodes to the Oracle HANFS cluster will not automatically scale the throughput of the entire cluster. Registering the HAVIP: bash-3.2# srvctl add havip -id HR1 -address agraves-clu4.us.oracle.com -netnum 1 - description "HR specific exports for the Omega Project" bash-3.2# srvctl status havip -id HR1 HAVIP ora.hr1.havip is enabled HAVIP ora.hr1.havip is not running bash-3.2# srvctl start havip -id HR1 PRCR-1079 : Failed to start resource ora.hr1.havip CRS-2805: Unable to start 'ora.hr1.havip' because it has a 'hard' dependency on resource type 'ora.HR1.export.type' and no resource of that type can satisfy the dependency Why the failure to start? Recall earlier that we mentioned that an HAVIP requires 1 or more ExportFS configured. Without an ExportFS, the HAVIP will not start. If a client had mounted the ExportFS and the HAVIP started without the ExportFS available, the client would receive an ESTALE error. This resource dependency prevents the resumption of NFS services on the client until the server side file system is available for access. Registering the ExportFS: bash-3.2# srvctl add exportfs -path /hr1 -id HR1 -name HR1 -options "rw,no_root_squash" -clients agraves-vm5,agraves-vm6 bash-3.2# srvctl status exportfs -name HR1 export file system hr1 is enabled export file system hr1 is not exported bash-3.2# srvctl add exportfs -path /hr2 -id HR1 -name HR2 -options "ro"
  • 9. Highly Available NFS (HANFS) over ACFS 8 At this point, starting either the ExportFS or the HAVIP will start all configured ExportFS on that HAVIP, or will start the associated HAVIP. We've chosen to export the second ExportFS, HR2, to only the sub-net of the network resource: bash-3.2# srvctl config exportfs -name HR2 export file system hr2 is configured Exported path: /hr2 Export Options: ro Configured Clients: 10.149.236.0/22 bash-3.2# srvctl config exportfs -name HR1 export file system hr1 is configured Exported path: /hr1 Export Options: rw,no_root_squash Configured Clients: agraves-vm5,agraves-vm6 Compare that with HR1, which is only available to 2 clients: agraves-vm5 and agraves-vm6. We can see the configured dependencies from the HAVIP to the other resources: START_DEPENDENCIES=hard(ora.net1.network,uniform:type:ora.HR1.export.type) attraction(ora.data.hr1.acfs,ora.data.hr2.acfs) dispersion:active(type:ora.havip.type) pullup(ora.net1.network) pullup:always(type:ora.HR1.export.type) STOP_DEPENDENCIES=hard(intermediate:ora.net1.network,uniform:intermediate:type: ora.HR1.export.type) These dependencies ensure that the HAVIP is started after the ExportFS and ACFS resources, and is stopped before them. There are several ways that we could start the exports: • srvctl start exportfs -id <ID> - will start all exports attached to the HAVIP with the id <ID> • srvctl start havip -id <ID> - will start all exports attached to the HAVIP with the id <ID> • srvctl start exportfs -name <NAME> - will start just the <NAME> ExportFS and its HAVIP bash-3.2# srvctl start exportfs -id HR1 bash-3.2# srvctl status exportfs export file system hr1 is enabled export file system hr1 is exported on node agraves-vm2 export file system hr2 is enabled
  • 10. Highly Available NFS (HANFS) over ACFS 9 export file system hr2 is exported on node agraves-vm2 bash-3.2# /usr/sbin/exportfs -v /hr1 agraves- vm5.us.oracle.com(rw,wdelay,no_root_squash,no_subtree_check,fsid=128850576,anonu id=65534,anongid=65534) /hr1 agraves- vm6.us.oracle.com(rw,wdelay,no_root_squash,no_subtree_check,fsid=128850576,anonu id=65534,anongid=65534) /hr2 10.149.236.0/22(ro,wdelay,root_squash,no_subtree_check,fsid=1573414370,anonuid=65 534,anongid=65534) Here we can clearly see that the exports are exported to the proper clients with the proper options. Node Relocation: Now let’s say we want to relocate our exported file system to another node in the cluster. The command for this is 'srvctl relocate havip': bash-3.2# srvctl relocate havip -id HR1 -node agraves-vm1 bash-3.2# srvctl status havip -id HR1 HAVIP ora.hr1.havip is enabled HAVIP ora.hr1.havip is running on node agraves-vm1 Using any of the commands available to determine resource state (crsctl, srvctl), we can see that they are now running on the new node: bash-3.2# crsctl stat res -w "TYPE = ora.HR1.export.type" -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.hr1.export 1 ONLINE ONLINE agraves-vm1 STABLE ora.hr2.export 1 ONLINE ONLINE agraves-vm1 STABLE -------------------------------------------------------------------------------- The same principle would hold true in the case of a unplanned outage – such as network failure, storage failure, or complete node failure. A More Complex Scenario Oracle HANFS has very flexible features to allow more complex configurations for maximizing your NFS availability and performance.
  • 11. Highly Available NFS (HANFS) over ACFS 10 Let’s examine a case where the admin has chosen 6 mount points for HANFS:  /hr1 – 500Kb/s  /hr – 10Mb/s – must be always available  /source – 100Kb/s – must be always available  /PDF – 10Mb/s  /Games – 100Mb/s (heavily used file system)  /World_Domination_Plans – 1Mb/s Due to high availability requirements, the best configuration for this would be: • /hr1, /PDF and /World_Domination_Plans configured on one HAVIP address • /Games on one HAVIP address • /source on one HAVIP address • /hr on one HAVIP address Rationale: • Placing /Games on its own HAVIP address isolates its intense throughput from other HAVIPs, allowing CRS to potentially place this HAVIP and its associated ExportFS on its own server, away from the other HAVIPs. (Assuming you have enough servers in your cluster.) • Placing /source on its own HAVIP address allows CRS to move it to a cluster member that can serve the file system, should there be a storage connection issue. Since there is only one ExportFS on this HAVIP, CRS needs only find a node where the ACFS file system is available, with no policy decision or trade-off necessary. • Placing /hr on its own HAVIP allows for the same logic to apply to /hr as applies to /source. • Placing the rest on their own HAVIP simplifies the number of IP addresses necessary. In the unlikely event that a different file system is available on each node of our cluster, CRS will place the HAVIP on the node that it determines is the best. This could cause 1 file system to be unavailable. CRS Policy Illustration – Choosing the Best Node: Consider the following cluster: Node 1 – available file systems: /fs1 /fs2 /fs3 Node 2 – available file systems: /fs1 /fs3 Node 3 – available file systems: /fs2 /fs3 /fs4 If we consider a single HAVIP, exporting all 4 file systems, CRS will make a policy decision as to the best place to export the file systems from. No node truly satisfies the desired intent of all file systems available from a single cluster node. So, CRS will determine that either Node 1 or Node 3 is the best place for our
  • 12. Highly Available NFS (HANFS) over ACFS 11 HAVIP and associated ExportFS. Either of these choices will result in 1 file system being unavailable due to storage connection issues. HAVIP Failover Times: There are 2 cases to consider when considering node failover: • Planned relocation • Node failure Planned Relocation: When the admin forcibly moves the HAVIP and associated ExportFS resources from node to node, the following steps are taken: 1) HAVIP on first node is shutdown. 2) Associated ExportFS resources on first node are shutdown. 3) Associated ExportFS resources are started on second node. 4) HAVIP is started on first node. When an HAVIP is configured with a large number of associated ExportFS resources, the time taken for this planned failover may become large. Each individual ExportFS should take no more than 1 or 2 seconds to shutdown on Node 1. If there is sufficient processing power that CRS can stop all of them in parallel, then this may happen quickly. However, if CRS must sequentially stop each ExportFS, due to CPU processing limitations, the worst case scenario is 2s * <Number of ExportFS resources>. The same scenario applies in reverse for startup time. Thus, worst case scenario TotalTimeToRelocate= (2s * <Number of ExportFS resources>) * 2 + 5s (where 5s is the time to stop and start the havip). The exported file systems are unavailable until the HAVIP is started, which would happen at the end of this process. So, while maintaining a single HAVIP may make management easier, it may increase the time for file systems to again begin processing and be available for clients. Node Failure: Node failure is similar – although easier. From the time that CRS notices the node is gone, the only steps that will be taken are the HAVIP startup, which will happen after all associated ExportFS resources are started. Thus, time taken is ½ the relocation time. NFS Considerations: Since HANFS relies on NFS, standard NFS configuration and performance tuning is applicable to the Oracle RAC HANFS product. Client options such as rsize and wsize may have a dramatic difference in the speed of data access.
  • 13. Highly Available NFS (HANFS) over ACFS 12 Additionally, the location of your file servers to your clients will affect NFS performance – best performance is usually gained by co-locating NFS servers with your clients. Oracle RAC HANFS Scalability: Let's discuss for a second the performance of the Oracle RAC HANFS Cluster. Assume that the backed network fabric supports 1 Gb/s throughput. If each cluster member is connected to this network, a theoretical maximum throughput is equal to 1 Gb/s * # of nodes in Oracle RAC HANFS Cluster. However, remember that CRS will move the HAVIP around, so if multiple HAVIPs are hosted on a single node, the maximum throughput is equal to (1 Gb/s * # of nodes in Oracle RAC HANFS Cluster) / (# of HAVIP on a single node). We can quickly see some performance benefits to keeping the number of HAVIPs in a single cluster = # of nodes in the Oracle RAC HANFS Cluster, assuming that this meets our high availability needs (as discussed earlier). If each node hosts 2 HAVIPs, then we could double our Oracle RAC HANFS Cluster's combined throughput by simple doubling the number of nodes in the cluster. Consider also the case where a single node is hosting 2 HAVIPs – and the performance of those HAVIPs exported file systems is 50% expected. Assuming that the backed storage and network fabric are appropriate for the expected throughput, we can remedy this situation by simply adding a new node to the Oracle RAC HANFS cluster. This causes CRS to move the HAVIP in such a manner that each node now only hosts 1 HAVIP, instead of having 2 on one node. ExportFS Resource Behavior: Alternative methods of starting the resource: • It is possible to start an export resource via Transparent High Availability, like many other resources. In this case, the ExportFS resource monitors whether or not the system reports that the file system is exported. Thus, it is possible to use the 'exportfs' (Linux) command to bring an export online. When this is done, the ExportFS resources may end up running on different nodes. This state is allowable, but not recommended. As soon as the associated HAVIP is started, the ExportFS resources will move to the node hosting that resource. - This method is not recommended for the following reasons: If the admin manually removes an export via an OS provided command line tools, such as exportfs, the associated ExportFS resource will also go offline. The use of a command line tool triggers a transparent action on the part of the ExportFS resource. Since the OS provided tools are not integrated with the CRS stack, this relies on the ExportFS resource to notice the change, which introduces a delay into the system to reflect the state of the system. Since the CRS resource
  • 14. Highly Available NFS (HANFS) over ACFS 13 stack may also evaluate other conditions, it is recommended to use Oracle provided tools, such as srvctl to administer ExportFS resources. Modifying Export Options using system tools: • When the admin uses a system tool to modify the export options (such as changing rw to ro), the ExportFS resource will reflect this via a 'PARTIAL' state. This tells the admin that the ExportFS is running, but the options that it was configured with are different than the currently exported options on that server. Stopping the ExportFS using THA: • If the admin manually removes an export via a command line tool such as exportfs, the associated ExportFS resource will also go offline. Removal of exported directory: • If the directory that is configured to be exported is removed via 'rm', the associated ExportFS will go offline. Controlling the Location of Exports in the Cluster: For the admin that prefers more control over the location of exports, an HAVIP can be configured to only run on certain nodes by use of the 'disable' and 'enable' commands. For instance, assume that the admin wanted to only have HR2 and HR1 (exported over HAVIP HR1) running on 1 node of a 2 node cluster. After adding the resource, the admin could run the following to limit the resource: bash-3.2# /scratch/crs_home/bin/srvctl disable havip -node agraves-vm2 -id HR1 Further Thoughts: Oracle HANFS may be used in many different ways with other Oracle ACFS features. The following is just a few examples: • Taking an ACFS snapshot of an ACFS file system exported via Oracle RAC HANFS. This would allow for a backup to be made. • Exporting an Oracle Home over Oracle RAC HANFS, and ensuring that it was always available. • Configuring an Oracle RAC HANFS ExportFS with Oracle ACFS security Realms so that it was read-only during certain time periods to prevent unauthorized access. • Using Oracle RAC HANFS with ASM disk groups configured with more than the default of 1 failure group to ensure that the underlying storage would be extremely fault tolerant. This would effectively remove the possibility of storage failure for a single disk group, while Oracle RAC HANFS would allow the export itself to be always available, creating a single extremely highly available file server.
  • 15. Oracle CloudFS January 2013 Update Author: Allan Graves Contributing Authors: Ara Shakian Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 oracle.com Copyright © 2011, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. 0109