Avoid cluster filesystem entering read-only mode

Leave a comment

Linux kernel remounts the filesystem into read-only mode whenever it cannot process I/O. This might happen due to various reasons such as disk failure, SAN connectivity issue, disk with bad blocks etc. On Virtual machine and SAN based storage environment even the high latency may lead to I/O hung and result in read-only mode.

Why and what is read-only mode?

All the data to be written or read from the filesystem are processed by kernel. Assume Linux kernel unable to do read/write on particular filesystem due any of above said reason. On such state to avoid further loss of data that been cached and awaiting to write it remounts filesystem in read-only mode. This also prevents filesystem from entering into inconsistent state.

At this stage if the underlying disk is accessible data can only read and no write is possible.

Solution

Step1

If it is the first time issue occurred you may no need to take any action rather than just restarting resource. Until it start repeating again.

    #crm resource restart <resource name>

Note: All dependent  resources will be restarted. (Ex. Application)

Step2

Identify the source of problem. The read-only issue occurs only when disk I/O processing failed.

  • Any multipath path failure reported and has it been recovered
  • From “dmesg” output check is there any disk bad block or inconsistency reported.
  • Finally check whether all the disks are available in good state from system.
Step3

On clustered systems the time taken to announce file system should go to read-only is depends on resource monitor interval and timeout values. On many occasions, commonly in virtual machine environment disk latency leads to read-only issue. Increasing timeout and interval values will fix the issue. It makes cluster (backend kernel actual worker) to wait till some more time before taking action.

The interval and timeout can be configured on fly. See example below command.

    #crm configure edit <resource name>

Here data1_fs is the resource name. Modify Interval and Timeout to desired value.

primitive data1_fs ocf:heartbeat:Filesystem \

        params device="/dev/mapper/appvg-app1" directory="/data1" fstype="ext3" \

        op monitor interval="60" timeout="60" \

        meta target-role="Started"

Check the configuration using this command.

   #crm configure show data1_fs

From my example system will not go into read-only until 60seconds. But make ensure the application capable of withstanding to 60 seconds without any I/O ops.

Step4

For non-clustered system the time till kernel should wait before marking as read-only was influenced by file “/sys/class/scsi_generic/*/device/timeout”. In virtual machine it is advised to set as 180 seconds.

On VMware VM’s installing the “vmtools” automatically set this to 180 for you. Check this VMware KB article for more details. This KB also says how to configure 180 seconds disk timeout using udev rules.

Cause

The cause for this issue depends on many factors such

  1. Disk latency
  2. SAN connectivity stability
  3. Huge I/O Load

Applicable to

May this solution would suit for other cluster framework as well. But I have only tested on Pacemaker cluster.

Clustered filesystem

With Pacemaker CRM you have multiple options for filesystem management.

  1. dlm + o2cb + Ocfs2
  2. dlm + cLVM2 + (ext3|ext4|..etc)
  3. dlm + gfs2

When any of above is configured pacemaker cluster is responsible for mount, unmount and monitoring.

Why to monitor?

It is recommended to configure monitoring operation to each resource. So that resource failure will be detected in near real-time. The monitoring operation needs to be added to resource with interval and timeout.

Interval=20s – At what frequency resource should be monitored. (Once in 20 seconds)

Timeout=40s – After this much amount of time command abort waiting and returns failure status (timer counts till 40 seconds since monitor command issued).

Have a great day !! Any questions ? Please do not hesitate to write here.

Leave a Reply

Your email address will not be published. Required fields are marked *