Monday, July 11, 2016

linux - input/output error at clean ext3 partition - how to check what wrong with data block



I have a problem with a file at a ext3 partition on CentOS 5 server (kernel version 2.6.18-164.15.1.el5) with a HP Raid Controller:



hpacucli ctrl all show detail

Smart Array P410 in Slot 1
Bus Interface: PCI
...



HP tool doesn't report any problems.



It's normal partition ext3 with the block size set to 2k, and it is fine - fsck output:



fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts

Pass 5: Checking group summary information


File inode is OK as well:



File: `name.xxx'
Size: 3126962 Blocks: 6124 IO Block: 4096 regular file
Device: 6851h/26705d Inode: 64579729 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-07-28 09:02:59.000000000 -0400

Modify: 2014-07-28 09:02:59.000000000 -0400
Change: 2014-07-28 09:02:59.000000000 -0400


One of the operation I cannot performance is file copy:



> cp /long_path/name.xxx .
cp: reading `/long_path.name.xxx': Input/output error



To pinpoint where is the problem I run dd to copy file:



> dd if=/long_path/name.xxx bs=2048 of=test
dd: reading `/long_path/name.xxx': Input/output error
222+0 records in
222+0 records out
454656 bytes (455 kB) copied, 0.042867 seconds, 10.6 MB/s


So I guess that problem is in the 223 file block.




Debugfs should help with locating that block on the disk



debugfs  -R "stat name.xxx" /dev/sdf
debugfs 1.39 (29-May-2006)
Inode: 64579729 Type: regular Mode: 0644 Flags: 0x0 Generation: 2900468317
User: 0 Group: 0 Size: 3126962
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 6124
Fragment: Address: 0 Number: 0 Size: 0

ctime: 0x53d64a03 -- Mon Jul 28 09:02:59 2014
atime: 0x53d64a03 -- Mon Jul 28 09:02:59 2014
mtime: 0x53d64a03 -- Mon Jul 28 09:02:59 2014
BLOCKS:
(0):130402311, (1-4):130402844-130402847, (5-6):130484033-130484034, (7):130484036,
(8-10):130484049-130484051, (11):130484055, (IND):130761221, (12-13):130761222-130761223,
(14):130763791, (15):130763942, (16):130765268, (17-23):130838937-130838943,
(24-46):130853946-130853968, (47-48):130855126-130855127, (49):130855215,
(50-53):130856428-130856431, (54-104):130856533-130856583, (105-341):130856748-130856984,
...

[MORE BLOCKS]
....
TOTAL: 1531


So I guess the problematic data are in the block 130856866.



How can I get more information about that block? I ran badblocks, and have a list of bad blocks. My guess is that I have to multiply above block number by 2 (file system block size is 2K and badblocks uses 1K by default). Also badblocks checks a disk, not a partition, so maybe I should to add some offset (there is one partition on that disk, so probably no).



> fdisk -l /dev/sdf


Disk /dev/sdf: 2000.3 GB, 2000365379584 bytes
255 heads, 63 sectors/track, 243197 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/cciss/c0d5p1 * 1 243197 1953479871 83 Linux


I also thought of using smartd. What should I look for?




Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 1457 0 2887405961 0 65948.712 18
write: 0 0 0 0 0 15056.493 0
verify: 0 1 0 361901613 0 3591.720 0

Non-medium error count: 226


SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Failed in segment --> - 34479 16845361 [0x3 0x11 0x0]
# 2 Background short Completed - 44 - [- - -]
# 3 Background short Completed - 39 - [- - -]
# 4 Background long Completed - 6 - [- - -]

Long (extended) Self Test duration: 18500 seconds [308.3 minutes]


Background scan results log
Status: scan is active
Accumulated power on time, hours:minutes 34541:56 [2072516 minutes]
Number of background scans performed: 1139, scan progress: 38.18%
Number of background medium scans performed: 1139

# when lba(hex) [sk,asc,ascq] reassign_status
1 19215:06 0000000000014c61 [3,11,0] Recovered via rewrite in-place
2 19215:07 0000000000014c66 [3,11,0] Recovered via rewrite in-place
3 19413:28 0000000001010a31 [3,11,0] Require Write or Reassign Blocks command

4 19943:24 000000000001ea99 [3,11,0] Recovered via rewrite in-place
5 20152:23 00000000000232b8 [3,11,0] Recovered via rewrite in-place
6 31229:34 810000004087f984 [3,11,0] Require Write or Reassign Blocks command
7 33021:51 810000004087ba85 [3,11,0] Require Write or Reassign Blocks command
8 33021:51 000000004087ba9f [3,11,0] Require Write or Reassign Blocks command
9 33021:52 000000004087bad6 [3,11,0] Require Write or Reassign Blocks command
10 33029:43 000000004087baa5 [3,11,0] Require Write or Reassign Blocks command
11 33055:27 000000004087bac3 [3,11,0] Require Write or Reassign Blocks command
12 33244:40 810000004087f9d6 [3,11,0] Require Write or Reassign Blocks command
13 33431:58 990000004087f105 [0,0,0] Reassignment by disk failed

14 33480:17 00000000463d7713 [3,11,0] Require Write or Reassign Blocks command
15 33480:19 00000000463d7723 [3,11,0] Require Write or Reassign Blocks command
16 33480:20 00000000463d7725 [3,11,0] Require Write or Reassign Blocks command
17 33480:28 81000000463d774e [3,11,0] Require Write or Reassign Blocks command
18 33686:17 8100000044e50edc [3,11,0] Require Write or Reassign Blocks command
19 34154:17 81000000432bef27 [3,11,0] Require Write or Reassign Blocks command
20 34463:43 810000001f32decd [3,11,0] Require Write or Reassign Blocks command
21 34463:43 0000000030080000 [3,11,0] Require Write or Reassign Blocks command



How should I marry above smartctl output (or any other output from smartd run) with my initial problem.



Also shouldn't such issue be address by HDD software?



BTW. I found following link useful to understand 'debugs -R' output. Maybe that link will be useful for one else.



UPDATE



Doing further research I found that action related to problematic inodes (like above cp command) trigger following line in kernel log:




kernel: cciss: cmd ffff810037e00000 has CHECK CONDITION sense key = 0x3 


'sense key' is a 'status' and part of SCSI standard (list here and more description here).


Answer



So, to figure this out I did the following.



Take your block number, multiply by four and add one



(130856866 * 4) + 1 = 523427465



This represents the sector reported as producing an I/O error. The block size being 2k, sectors being 512 bytes. The additional one extra accounts for the starting sector offset for the partition.



To correlate with SMART, we'll need to convert the value we now have into hexadecimal.



$ printf "0x%x\n" 523427465
0x1f32de89



Now, when you correlate that with what SMART shows, a line comes up suspiciously close..



20 34463:43  810000001f32decd  [3,11,0]   Require Write or Reassign Blocks command


How far away?



$ bc -l
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.

This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
obase=16
ibase=16
1F32DECD-1F32DE89
44


That works out as being only between 34816 and 32768 bytes away but we cant say which sector is damaged out of the four that comprise the block.




If I had to hazard a guess, I'd say that theres probably a whole slew of blocks around the same address which will report I/O errors (assuming the raid striping is say 32k in size or whatever).



Additionally a read may not pick up the problem if the RAID is fetching the block chunk from a different disk. A write must propagate to all the disks in a RAID1 setup anyhow so this could make writes fail but reads succeed. Additionally, if we assume that the chunk size of the RAID card is 32k we can also assume that the damaged block plus the one reported by SMART are both damaged by whatever happened on that platter. Its just the SMART test read from the good disk for the first 32k and the bad disk for the next 32k.



Modern hard disks keep 'reserve sectors' to replace damaged sectors like this with a new sector location. Seeing as you are now getting this, and the that Reassign by disk failed message from smart I'd say a disk has ran out.



In terms of doing something about it; thats a bit more trickier. The LBA addressing is an abstraction against the real disk underneath. You'd need to identify which disk it is that is causing this issue, fail it in the RAID array and replace it.



In any case, you have a bad disk and you should look to replace it ASAP.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...