Friday, June 29, 2018

How to determine number of write cycles or expected life for SSD under Linux?



We've been running an SSD (Intel X25-M) in a Linux (RHEL 5) server for a while, but never made any effort to figure out how much write load it was under for the past year. Is there any tool under Linux to tell us approximately how much has been written to the disk over time or (even better) how much wear it has accumulated? Just looking for a hint to see if it's near death or not...


Answer



Intel SSDs do keep statistics on total writes and how far through it's likely lifespan it is.



The following is from an Intel X25-M G2 160GB (SSDSA2M160G2GC)



# smartctl -data -A /dev/sda
smartctl 5.40 2010-10-16 r3189 [x86_64-redhat-linux-gnu] (local build)

Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0020 100 100 000 Old_age Offline - 0
4 Start_Stop_Count 0x0030 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 1
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 6855

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 68
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 30
225 Host_Writes_32MiB 0x0030 200 200 000 Old_age Offline - 148487
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 3168
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 1
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 1950295543
232 Available_Reservd_Space 0x0033 099 099 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 098 098 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 099 Pre-fail Always - 0



The Host_Writes_32MIB raw value shows how many 32MiB units of data have been written to this drive.



The Media_Wearout_Indicator value shows you a normalised percentage of how far through its useful wear-lifespan the drive is. This starts at 100 (or 099, I forget which), and proceeds down to 001, at which point Intel consider the drive to have exceeded its useful life. Intel use the MWI as part of warranty claims too - once the MWI reaches 001, the warranty is expired.



The MWI reaching 001 does not mean the drive will fail immediately however! Intel will have tolerance built in to deal with variances in flash units. I've seen drives last well past this point, and I'm actively wear-testing some Intel 320 series SSDs to see how much longer they last.



However, as the warranty expires when the MWI reaches 001, I'd replace any drives at that point.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...