ext4 - How does SSD meta-data corruption on power-loss happen? And can I minimize it?

Monday, December 28, 2015

ext4 - How does SSD meta-data corruption on power-loss happen? And can I minimize it?

Note: This is a follow-up question to Is there a way to protect SSD from corruption due to power loss?. I got good info there but it basically centered in three area, "get a UPS", "get better drives", or how to deal with Postgres reliability.

But what I really want to know is whether there is anything I can do to protect the SSD against meta-data corruption especially in old writes. To recap the problem. It's an ext4 filesystem on Kingston consumer-grade SSDs with write-cache enabled and we're seeing these kinds of problems:

files with the wrong permissions

files that have become directories (for example, toggle.wav is now a directory with files in it)

directories that have become files (not sure of content..)

files with scrambled data

The problem is less with these things happening on data that's being written while the drive goes down, or shortly before. It's a problem but it's expected and I can handle that in other ways.

The bigger surprise and problem is that there is meta-data corruption happening on the disk in areas that were not recently written to (ie, a week or more before).

I'm trying to understand how such a thing can happen at the disk/controller level. What's going on? Does the SSD periodically "rebalance" and move blocks around so even though I'm writing somewhere else? Like this:

And then there is a power loss when D is being rewritten. There may be pieces left on block 1 and some on block 2. But I don't know if it works this way. Or maybe there is something else happening..?

In summary - I'd like to understand how this can happen and if there anything I can do to mitigate the problem at the OS level.

Note: "get better SSDs" or "use a UPS" are not valid answers here - we are trying to move in that direction but I have to live with the reality on the ground and find the best outcome with what we have now. If there is no solution with these disks and without a UPS, then I guess that's the answer.

References:

Is post-sudden-power-loss filesystem corruption on an SSD drive's ext3 partition "expected behavior"?
This is similar but it's not clear if he was experiencing the kinds of problems we are.

EDIT:
I've also been reading issues with ext4 that might have problems with power-loss. Ours are journaled, but I don't know about anything else.

Prevent data corruption on ext4/Linux drive on power loss

http://www.pointsoftware.ch/en/4-ext4-vs-ext3-filesystem-and-why-delayed-allocation-is-bad/

Answer

For how metadata corruption can happen after an unexpected power failure, give a look at my other answer here.

Disabling cache can significantly reduce the likehood of in-flight data loss; however, based on your SSDs, data-at-rest remain at risk of being corrupted. Moreover, it commands a massive performance loss (I saw 500+ MB/s SSDs to write at a mere 5 MB/s after disabling the private DRAM cache).

If you can't trust your SSDs, the only "solution" (or, rather, workaround) is to use an end-to-end checksumming filesystem as ZFS or BTRFS and a RAID1/mirror setup: in this manner, any eventual single-device (meta)data corruption can be recovered from the other mirror side by running a check/scrub.

Blog

Monday, December 28, 2015

ext4 - How does SSD meta-data corruption on power-loss happen? And can I minimize it?

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server