Revision as of 11:51, 5 April 2009

cluster.conf configuration file

Configuration file for:

cman - Cluster configuration
fence - Fence configuration for disabling nodes with errors
dlm - Distributed Lock Manager Configuration. Rules for access to shared resources
gfs - Global file System configuration. Shared file systems among nodes.
rgmanager - Resource Group manager configuration. Fx. apache service setup on cluster.

See RedHAT 5 cluster scheme

cman - Basci cluster config

fence - Fencing nodes

dlm - lock management

gfs - global file system

rgmanager - resource config

Resource Group manager is a High Availability service. Rgmanager can start and stop services on nodes. If a service is failing on one node it will be started on another node. Rgmanager monitor the services and make sure they are actually runnning.
The rgmanager service must run on all nodes participating in a service group.

Service Groups

A Service Group is a group of nodes on which a specified service can be started or stopped by rgmanager. Not all nodes in a cluster need to be member of a Service Group. There can be many Service Groups in a cluster.If a service fails, a script is called to automatically restart the service. If a node fails, the service may be relocated to a different node in the service group.

What is a Cluster-service

A Cluster-service is a resource that are shared among nodes. For example a apache WEB-service. This service can be run in two different ways.

active-passive Cluster-service

An active-passive Cluster-service is a service running on one node at a time. If the node running the service fails the service is started on another node in the Service Group.

active-passive example

Three front-end-nodes have the responsibility of delivering a high-availability WEB-service. In the image below there are three services

Picture 1 - Normal operation

Filesystem: The filesystem is a ext3 filesystem which can only be mounted on one node at a time. In Picture 1 below the left node has mounted the shared filesystem on the SCSI-raid controllers. The other servers has not mounted the shared filesystem.
Service: The Apache WEB-server service is running on the left node, and using the resources from the shared SCSI filesystem delivering WEB-content to the Internet. The Apache service on the other nodes are stopped.
IP address: The left server is answering request for the shared IP address 80.1.2.3. The other nodes ignores 80.1.2.3.

Picture 2 - Fault in left node

When an error is discovered by the rgmanager on the failing node, rgmanager communicates with the rgmanager on the other nodes and decide which other node should transition from passive to active. In the example on Picture 2, the middle node goes to active and continues to server WEB-requests to 80.1.2.3.

Transition steps in example

When the rgmanager on the left discover an error on the left node it will,

Stop the apache server, unmount the shared file system and release the virtual IP address 80.1.2.3. The order is configurable.
Communicating to the other rgmanagers on the other nodes and deciding which node should be active.
Optionally get fenced shut down the failing node, eventually by removing power to it. (Then we are sure all resources are released - IP address and filesystem)

Picture 1: Active-passive example

Picture 2: Active failed example

active-active Cluster-service

An active-active Cluster-service is a service running on all the nodes at the same time. If a node fails the other nodes takes over the load.

active-active example

Three front-end-nodes have the responsibility of delivering a high-availability and high-load WEB-service. In the image below there are three services

Picture 1 - Normal operation

Filesystem: The filesystem is a ext3 filesystem which can only be mounted on one node at a time. In Picture 1 below the left node has mounted the shared filesystem on the SCSI-raid controllers. The other servers has not mounted the shared filesystem.
Service: The Apache WEB-server service is running on the left node, and using the resources from the shared SCSI filesystem delivering WEB-content to the Internet. The Apache service on the other nodes are stopped.
IP address: The left server is answering request for the shared IP address 80.1.2.3. The other nodes ignores 80.1.2.3.

Picture 2 - Fault in left node

When an error is discovered by the rgmanager on the failing node, rgmanager communicates with the rgmanager on the other nodes and decide which other node should transition from passive to active. In the example on Picture 2, the middle node goes to active and continues to server WEB-requests to 80.1.2.3.

Transition steps in example

When the rgmanager on the left discover an error on the left node it will,

Stop the apache server, unmount the shared file system and release the virtual IP address 80.1.2.3. The order is configurable.
Communicating to the other rgmanagers on the other nodes and deciding which node should be active.
Optionally get fenced shut down the failing node, eventually by removing power to it. (Then we are sure all resources are released - IP address and filesystem)

Picture 1: Active-passive example

Picture 2: Active failed example

files and programs

/usr/share/cluster - here lives the rgmanager scripts
/etc/cluster/cluster.conf - rgmanager configuration
clustat - See cluster and service status clustat -s SERVICE_NAME -l
RedHAT rgmanager FAQ

@@ Line 19: / Line 19: @@
 A Cluster-service is a resource that are shared among nodes. For example a ''apache'' WEB-service. This service can be run in two different ways.
 ===active-passive Cluster-service===
-An active-passive Cluster-service is a service running on one node at a time. If the node running the service fails the service is started on another node in the Service Group.
+An ''active-passive'' Cluster-service is a service running on one node at a time. If the node running the service fails the service is started on another node in the Service Group.
 ====active-passive example====
-Two ''front-end-nodes'' have the responsibility of delivering a high-availability WEB-service. In the image below there are three services
+Three ''front-end-nodes'' have the responsibility of delivering a high-availability WEB-service. In the image below there are three services
+{|
+|-
+|valign="top"|
+===== Picture 1 - Normal operation =====
+#Filesystem: The filesystem is a [[ext3]] filesystem which can only be mounted on one node at a time. In Picture 1 below the left node has mounted the shared filesystem on the SCSI-raid controllers. The other servers has not mounted the shared filesystem.
+#Service: The Apache WEB-server service is running on the left node, and using the resources from the shared SCSI filesystem delivering WEB-content to the Internet. The Apache service on the other nodes are stopped.
+#IP address: The left server is answering request for the shared IP address 80.1.2.3. The other nodes ignores 80.1.2.3.
+===== Picture 2 - Fault in left node =====
+When an error is discovered by the ''rgmanager'' on the failing node, ''rgmanager'' communicates with the ''rgmanager'' on the other nodes and decide which other node should transition from ''passive'' to ''active''. In the example on Picture 2, the middle node goes to active and continues to server WEB-requests to 80.1.2.3.
+====== Transition steps in example ======
+When the ''rgmanager'' on the left discover an error on the left node it will,
+*Stop the apache server, unmount the shared file system and release the virtual IP address 80.1.2.3. The order is configurable.
+*Communicating to the other ''rgmanager''s on the other nodes and deciding which node should be active.
+*Optionally get ''fenced'' shut down the failing node, eventually by removing power to it. (Then we are sure all resources are released - IP address and filesystem)
+|valign="top"|
+|[[Image:Cluster active-passive.png|200px|thumb|Picture 1: Active-passive example]]
+|[[Image:Cluster active-passive fail.png|200px|thumb|Picture 2: Active failed example]]
+|-
+|}
+===active-active Cluster-service===
+An ''active-active'' Cluster-service is a service running on all the nodes at the same time. If a node fails the other nodes takes over the load.
+====active-active example====
+Three ''front-end-nodes'' have the responsibility of delivering a high-availability and high-load WEB-service. In the image below there are three services
 {|
 |-

Navigation menu

Difference between revisions of "CentOS Cluster Configuration"

Revision as of 11:51, 5 April 2009

Contents

cluster.conf configuration file

cman - Basci cluster config

fence - Fencing nodes

dlm - lock management

gfs - global file system

rgmanager - resource config

Service Groups

What is a Cluster-service

active-passive Cluster-service

active-passive example

Picture 1 - Normal operation

Picture 2 - Fault in left node

Transition steps in example

active-active Cluster-service

active-active example

Picture 1 - Normal operation

Picture 2 - Fault in left node

Transition steps in example

files and programs