miércoles, 26 de mayo de 2010

OCFS2 self-fencing during cluster failover on SUSE Linux Enterprise



During a clustered failover or network disruption, all paths to the storage system can become unavailable. As a result, the nodes might fence by panicking. Set the O2CB_HEARTBEAT_THRESHOLD value in /etc/sysconfig/o2cb to a value large enough to ensure that the nodes do not self-fence.

Oracle Cluster File System (OCFS2) has a fencing mechanism that relies on each node being able to write a heartbeat value to a specific area on each OCFS2 volume. Every two seconds, each node writes a new heartbeat value, and reads the value of the other node. If a node is unable to write or read the heartbeat value, it retries the operation a specific number of times, and then self-fences by panicking the Linux kernel as it fails to obtain quorum.

During a clustered failover or network disruption, all paths to the storage system can become unavailable. None of the nodes in the OCFS2 cluster is able to write or read the heartbeat values. As a result, the nodes might fence by panicking.

With a dedicated TCP/IP network for the iSCSI link between Linux hosts and the storage system, a reasonable value usually is 181. This value corresponds to a total of 360 seconds before a node self-fences.

Depending on the load in the iSCSI network, you should raise the value. It is best to use a value in the range of 91 to 201 for a configuration with OCFS2 on multipath, and iSCSI devices.

No hay comentarios:

Publicar un comentario