About Corosync and Heartbeat

Many of us will get confused with Corosync and Heartbeat. Have described about both corosync & heartbeat functionality.

This help you to have comparison of corosync vs heartbeat.
To get idea about where to use corosync and heartbeat?
What cluster engine been supported by corosync and heartbeat?

Corosync

The Corosync Cluster Engine is a Group Communication System with additional features for implementing high availability within applications. The project provides four C Application Programming Interface features:

• A closed process group communication model with virtual synchrony guarantees for creating replicated state machines.
• A simple availability manager that restarts the application process when it has failed.
• A configuration and statistics in-memory database that provide the ability to set, retrieve, and receive change notifications of information.
• A quorum system that notifies applications when quorum is achieved or lost.

Corosync is used as a High Availability framework by projects such as Apache Qpid and Pacemaker.

Heartbeat

Heartbeat is a daemon that provides cluster infrastructure (communication and membership) services to its clients. This allows clients to know about the presence (or disappearance!) of peer processes on other machines and to easily exchange messages with them.

In order to be useful to users, the Heartbeat daemon needs to be combined with a cluster resource manager (CRM) which has the task of starting and stopping the services (IP addresses, web servers, etc.) that cluster will make highly available. Pacemaker is the preferred cluster resource manager for clusters based on Heartbeat.

 


Resolve cib digest error

Sometimes our cib.conf (which is pacemaker cluster configuration file) might accumulate with white spaces. With current running live file we will not see any impact. But the actual headache starts when you close and reopen the cib.conf. Which means completely stop cluster service in all nodes and start back.

During cluster service startup it calculates md5 checksum value and compare with the one available in system. Here we get mismatch error and service startup will be failed.

As a temporary fix remove white space in cib.conf using below command.

#cibadmin -Q -o configuration | sed ‘s/^\s*//’ | sed ‘s/\s*$//’ | tr -d ‘\n’ | sed ‘s/ /\\n/g’| xmllint –copy – | cibadmin -R -o configuration -p

You must recreate md5 checksum value to have safe cluster operation.