Linux server does not responding to application queries. The performance is too low. Underlying disks or SAN LUN’s are normal. From top output found the I/O wait is too high. But there is no I/O intensive tasks are running. While, How to fix high I/O wait issue?
Cause
Have you removed any san LUN’s or disks from server recently (a day or week before)?
Even after deletion of LUN/disk, the SCSI subsystem still points/query the deleted old devices until reboot. The error/warning message can be seen in system log. This will leads to high I/O wait issue. And in turn reduce the server performance drastically. Sometimes it might deny executing a command due to CPU resource starving.
Fix high I/O wait issue
Avoid such performance degrade by wiping out the traces of deleted LUN/disk from kernel. Simply rebooting server should work. If not possible to reboot follow these steps. Wipe out of deleted LUN/disk device traces can be done without rebooting server.
Step1: Identify deleted disk devices
If you have the previously deleted device names handy proceed with step2. Else any of the below commands may help you. I do not have straightforward method to get the deleted device name. If I come to know definitely will update the article.
#iostat --->Shows the disks known to kernel with it I/O transfer rate
#fdisk –l #pvs #swapon –s #df –h |grep –e “/dev/sd” –e “/dev/xsd” –e /dev/cc” #dmesg |grep –i -e error –e i/o --> Look for any I/O error to specific devices. If yes compare with fdisk –l output to ensure it is in use or not.
In SUSE Linux,
#lsscsi –d --->This command lists the disk device names known to kernel
Step2: wipe out the deleted LUN/disk traces from kernel
#kpartx –d /dev/sdc
If multipath enabled, flush out the paths.
#multipath –f mapthc
Flush the I/O pending for write to the disk which has been unplugged forcefully.
# blockdev –flushbufs devices /dev/sdc
Finally remove the SCSI subsystem path to the unplugged disk device.
# echo 1 > /sys/block/sdc/device/delete
The I/O wait utilization will reduce gradually. Monitor the top command output for few minutes.
If you cannot identify the correct suspicious disk device name, simply reboot the server. It should fix the issue. Because by mistake applying above commands against working disk will delete the associated device. Moreover the data waiting for write to disk will be lost.
Hope this helps you? Please share your comments !!