On a RedHat 6 server, we ran into an issue with online resizing of an ext4 filesystem.
With only /dev/sda we had 13GB available in the volume group, but needed 20GB more on one logical volume which was 36GB. Added /dev/sdb to the volume group, and the file system was extended (lvextend) and resized (resize2fs) to 56GB.
No error messages during the resize, and the OS reported the new size.
The logical volume in question hosts an installation of IBM HTTP Server (apache 2.2), config and log files for some 8 different web servers.
This morning the file system usage grew beyond 36GB.
What happened first was that the webservers stopped logging (discovered after), while the web servers kept on running without issues.
2,5 hours later, in relation to log rotation and some other writes to the file system things started to freeze up.
Meaning: the webservers stopped taking traffic, allthough the processes stayed up, trying to "tail" a log file would hang, and could not be interupted.
The load of the server went from 0.10 to 4000 (yes...) - mostly related to iowait (it would seem).
The sollution was to shut down the webserver - kill -9 was the only way, and reboot the server. Umount the filesystem, did an fsck (no errors), and start things up again.
No issues since.
We can excactly time the error with logging stopping to the time the disk (lv) usage grew above it's previous size of 36GB.
Services on other file systems seemed to work fine - amongs others the operating system.
In /var/log/messages we saw i.e.:
kernel: INFO: task httpd: blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: httpd D 0000000000000001 0 6889 6865 0x00000080
kernel: ffff88023aa99c88 0000000000000086 0000000000000000 0000000000006102
kernel: ffff88010aebaa80 ffff880105dd0ae0 000000003aa99c08 ffff880105dd0ae0
kernel: ffff880105dd1098 ffff88023aa99fd8 000000000000fb88 ffff880105dd1098
kernel: Call Trace:
kernel: [] __mutex_lock_slowpath+0x13e/0x180
kernel: [] mutex_lock+0x2b/0x50
kernel: [] generic_file_aio_write+0x71/0x100
kernel: [] ext4_file_write+0x61/0x1e0 [ext4]
kernel: [] do_sync_write+0xfa/0x140
kernel: [] ? autoremove_wake_function+0x0/0x40
kernel: [] ? security_file_permission+0x16/0x20
kernel: [] vfs_write+0xb8/0x1a0
kernel: [] sys_write+0x51/0x90
kernel: [] ? __audit_syscall_exit+0x265/0x290
kernel: [] system_call_fastpath+0x16/0x1b
Versions:
Kernel: 2.6.32-358.2.1.el6.x86_64
lvm2-2.02.98-9.el6.x86_64
e2fsprogs-1.41.12-14.el6.x86_64
There were found no issues with the underlying hardware.
Answer
The answer is:
The filesystem was created with mke2fs
The default behaviour is then to create an ext2 filesystem.
However it was mounted as an ext4 filesystem - without any error messages - and later percieved as an ext4 filesystem.
So no wonder online resizing worked, and no wonder the extended portion was recognized after an unmount/mount or reboot.
It took some time to discover since there was a long time between the creation and the resizing and was finally disovered when running blkid
, which said "ext2". tune2fs -l
also said "not clean".
No comments:
Post a Comment