Prevent data corruption on ext4/Linux drive on power loss

Sunday, May 22, 2016

Prevent data corruption on ext4/Linux drive on power loss

I have some embedded boards running American Megatrends bios with embedded linux as the OS. The problem I have is that the industrial flash ide's will be corrupted on power loss. I have them formatted as ext4. Whenever this happens, I can usually fix the flash with fsck, but this will not be possible in our deployments. I have heard that disabling the write-caching should help, but I can't figure out how to do it. Also, is there any thing else I should do?

More Info

The drive is a 4gb ide flash module.
I have one partition which is ext4. The O.S. is installed on that partition and grub is my bootloader.

fdisk -l shows /dev/sda as my flash module with /dev/sda1 as my primary partition.

After a power loss I usually cannot make it entirely through the boot init scripts.

When I mount the drive on another P.C. I run fsck /dev/sda1. It always shows messages like

"zero datetime on node 1553 ... fix (y)?"

I fix them and it boots fine until the next power loss.

When I get to the office tomorrow, I will post the actual output of fdisk -l

This is all I know about how the system works. I am not a systems guy, I am a Software Engineer that has a habit of getting into predicaments that are outside of his job description. I know how to format drives, install a bootloader, write software, and hack on an operating system.

Here is the output from dumpe2fs

#sudo dumpe2fs /dev/sda1
dumpe2fs 1.41.12 (17-May-2010)

Filesystem volume name:   VideoServer
Last mounted on:          /
Filesystem UUID:          9cba62b0-8038-4913-be30-8eb211b23d78
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         not clean
Errors behavior:          Continue

Filesystem OS type:       Linux
Inode count:              245760
Block count:              977949
Reserved block count:     48896
Free blocks:              158584
Free inodes:              102920
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      239

Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Fri Feb  4 15:12:00 2011
Last mount time:          Sun Oct  2 23:48:37 2011
Last write time:          Mon Oct  3 16:34:01 2011
Mount count:              2
Maximum mount count:      26

Last checked:             Tue Oct  4 07:44:50 2011
Check interval:           15552000 (6 months)
Next check after:         Sun Apr  1 07:44:50 2012
Lifetime writes:          21 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           256
Required extra isize:     28
Desired extra isize:      28

Default directory hash:   half_md4
Directory Hash Seed:      249d2b79-1e20-49a3-b324-6cb631294a63
Journal backup:           inode blocks

Answer

The write cache has usually nothing to do with the BIOS, mostly there is no option for switching disk cache settings in there. With linux, using hdparm -W 0 should help.

The setting is persistent, so if you don't have hdparm to play around with in your production systems, you should be able to disable the disk write cache on a different system and replug the disk.

BTW: I'd second the idea of a non-writable root filesystem (so your system could boot in a kind of "recovery mode" and allow for remote access even if the writable filesystem is not mountable for some reason). And if you can change the hardware design, consider using mtd devices instead of IDE/SATA disks with a flash-aware filesystem like jffs2. We've been using this combination with several embedded devices (mostly VPN router solutions in the field) for several years with good results.

Update: the root of your problem seems to be that you are running an ext4 filesystem with journaling disabled - has_journal is missing from the Filesystem features list. Just shut down all services, check if anything still has open files using lsof +f -- /, remount your root partition read-only with mount -o remount,ro /, enable the journal with tune2fs -O has_journal /dev/sda1 and set up the "ordered" journal mode as the default mount option using tune2fs -o journal_data_ordered /dev/sda1 - you will have to re-run fsck (preferably from a rescue system) and remount root / reboot after this operation.

With these settings in place, the metadata is guaranteed to be recoverable from the journal even in the event of a sudden power failure. The actual data is also consistently written to disk, although you may see data of several seconds before the power outage lost on bootup. If this is not acceptable, you might consider using the tune2fs -o journal_data /dev/sda1 mount option with your filesystem - this would include all data written to disk in the journal - this obviously would give you better data consistency but at the cost of a performance penalty and a higher wear level on your SSD.

Blog

Sunday, May 22, 2016

Prevent data corruption on ext4/Linux drive on power loss

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server