linux - input/output error at clean ext3 partition - how to check what wrong with data block

I have a problem with a file at a ext3 partition on CentOS 5 server (kernel version 2.6.18-164.15.1.el5) with a HP Raid Controller:

hpacucli ctrl all show detail

Smart Array P410 in Slot 1
   Bus Interface: PCI
   ...

HP tool doesn't report any problems.

It's normal partition ext3 with the block size set to 2k, and it is fine - fsck output:

fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts

Pass 5: Checking group summary information

File inode is OK as well:

File: `name.xxx'
Size: 3126962       Blocks: 6124       IO Block: 4096   regular file
Device: 6851h/26705d    Inode: 64579729    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-07-28 09:02:59.000000000 -0400

Modify: 2014-07-28 09:02:59.000000000 -0400
Change: 2014-07-28 09:02:59.000000000 -0400

One of the operation I cannot performance is file copy:

> cp /long_path/name.xxx .
 cp: reading `/long_path.name.xxx': Input/output error

To pinpoint where is the problem I run dd to copy file:

> dd if=/long_path/name.xxx bs=2048 of=test
 dd: reading `/long_path/name.xxx': Input/output error
 222+0 records in
 222+0 records out
 454656 bytes (455 kB) copied, 0.042867 seconds, 10.6 MB/s

So I guess that problem is in the 223 file block.

Debugfs should help with locating that block on the disk

debugfs  -R "stat name.xxx" /dev/sdf
debugfs 1.39 (29-May-2006)
Inode: 64579729   Type: regular    Mode:  0644   Flags: 0x0   Generation: 2900468317
User:     0   Group:     0   Size: 3126962
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 6124
Fragment:  Address: 0    Number: 0    Size: 0

ctime: 0x53d64a03 -- Mon Jul 28 09:02:59 2014
atime: 0x53d64a03 -- Mon Jul 28 09:02:59 2014
mtime: 0x53d64a03 -- Mon Jul 28 09:02:59 2014
BLOCKS:
(0):130402311, (1-4):130402844-130402847, (5-6):130484033-130484034, (7):130484036,
(8-10):130484049-130484051, (11):130484055, (IND):130761221, (12-13):130761222-130761223,   
(14):130763791, (15):130763942, (16):130765268, (17-23):130838937-130838943,  
(24-46):130853946-130853968, (47-48):130855126-130855127, (49):130855215, 
(50-53):130856428-130856431, (54-104):130856533-130856583, (105-341):130856748-130856984, 
...

[MORE BLOCKS]     
....
TOTAL: 1531

So I guess the problematic data are in the block 130856866.

How can I get more information about that block? I ran badblocks, and have a list of bad blocks. My guess is that I have to multiply above block number by 2 (file system block size is 2K and badblocks uses 1K by default). Also badblocks checks a disk, not a partition, so maybe I should to add some offset (there is one partition on that disk, so probably no).

> fdisk -l /dev/sdf


Disk /dev/sdf: 2000.3 GB, 2000365379584 bytes
255 heads, 63 sectors/track, 243197 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
       Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d5p1   *       1      243197  1953479871   83  Linux

I also thought of using smartd. What should I look for?

Error counter log:
       Errors Corrected by           Total   Correction     Gigabytes    Total
           ECC          rereads/    errors   algorithm      processed    uncorrected
       fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0     1457         0  2887405961          0      65948.712          18
write:         0        0         0         0          0      15056.493           0
verify:        0        1         0  361901613          0       3591.720           0

Non-medium error count:      226


SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
   Description                              number   (hours)
# 1  Background long   Failed in segment -->       -   34479          16845361 [0x3 0x11 0x0]
# 2  Background short  Completed                   -      44                 - [-   -    -]
# 3  Background short  Completed                   -      39                 - [-   -    -]
# 4  Background long   Completed                   -       6                 - [-   -    -]

Long (extended) Self Test duration: 18500 seconds [308.3 minutes]


Background scan results log
Status: scan is active
  Accumulated power on time, hours:minutes 34541:56 [2072516 minutes]
  Number of background scans performed: 1139,  scan progress: 38.18%
  Number of background medium scans performed: 1139

 #  when        lba(hex)    [sk,asc,ascq]    reassign_status
 1 19215:06  0000000000014c61  [3,11,0]   Recovered via rewrite in-place
 2 19215:07  0000000000014c66  [3,11,0]   Recovered via rewrite in-place
 3 19413:28  0000000001010a31  [3,11,0]   Require Write or Reassign Blocks command

 4 19943:24  000000000001ea99  [3,11,0]   Recovered via rewrite in-place
 5 20152:23  00000000000232b8  [3,11,0]   Recovered via rewrite in-place
 6 31229:34  810000004087f984  [3,11,0]   Require Write or Reassign Blocks command
 7 33021:51  810000004087ba85  [3,11,0]   Require Write or Reassign Blocks command
 8 33021:51  000000004087ba9f  [3,11,0]   Require Write or Reassign Blocks command
 9 33021:52  000000004087bad6  [3,11,0]   Require Write or Reassign Blocks command
10 33029:43  000000004087baa5  [3,11,0]   Require Write or Reassign Blocks command
11 33055:27  000000004087bac3  [3,11,0]   Require Write or Reassign Blocks command
12 33244:40  810000004087f9d6  [3,11,0]   Require Write or Reassign Blocks command
13 33431:58  990000004087f105  [0,0,0]   Reassignment by disk failed

14 33480:17  00000000463d7713  [3,11,0]   Require Write or Reassign Blocks command
15 33480:19  00000000463d7723  [3,11,0]   Require Write or Reassign Blocks command
16 33480:20  00000000463d7725  [3,11,0]   Require Write or Reassign Blocks command
17 33480:28  81000000463d774e  [3,11,0]   Require Write or Reassign Blocks command
18 33686:17  8100000044e50edc  [3,11,0]   Require Write or Reassign Blocks command
19 34154:17  81000000432bef27  [3,11,0]   Require Write or Reassign Blocks command
20 34463:43  810000001f32decd  [3,11,0]   Require Write or Reassign Blocks command
21 34463:43  0000000030080000  [3,11,0]   Require Write or Reassign Blocks command

How should I marry above smartctl output (or any other output from smartd run) with my initial problem.

Also shouldn't such issue be address by HDD software?

BTW. I found following link useful to understand 'debugs -R' output. Maybe that link will be useful for one else.

UPDATE

Doing further research I found that action related to problematic inodes (like above cp command) trigger following line in kernel log:

kernel: cciss: cmd ffff810037e00000 has CHECK CONDITION sense key = 0x3

'sense key' is a 'status' and part of SCSI standard (list here and more description here).

Answer

So, to figure this out I did the following.

Take your block number, multiply by four and add one

(130856866 * 4) + 1 = 523427465

This represents the sector reported as producing an I/O error. The block size being 2k, sectors being 512 bytes. The additional one extra accounts for the starting sector offset for the partition.

To correlate with SMART, we'll need to convert the value we now have into hexadecimal.

$ printf "0x%x\n" 523427465
0x1f32de89

Now, when you correlate that with what SMART shows, a line comes up suspiciously close..

20 34463:43  810000001f32decd  [3,11,0]   Require Write or Reassign Blocks command

How far away?

$ bc -l
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.

This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
obase=16
ibase=16
1F32DECD-1F32DE89
44

That works out as being only between 34816 and 32768 bytes away but we cant say which sector is damaged out of the four that comprise the block.

If I had to hazard a guess, I'd say that theres probably a whole slew of blocks around the same address which will report I/O errors (assuming the raid striping is say 32k in size or whatever).

Additionally a read may not pick up the problem if the RAID is fetching the block chunk from a different disk. A write must propagate to all the disks in a RAID1 setup anyhow so this could make writes fail but reads succeed. Additionally, if we assume that the chunk size of the RAID card is 32k we can also assume that the damaged block plus the one reported by SMART are both damaged by whatever happened on that platter. Its just the SMART test read from the good disk for the first 32k and the bad disk for the next 32k.

Modern hard disks keep 'reserve sectors' to replace damaged sectors like this with a new sector location. Seeing as you are now getting this, and the that Reassign by disk failed message from smart I'd say a disk has ran out.

In terms of doing something about it; thats a bit more trickier. The LBA addressing is an abstraction against the real disk underneath. You'd need to identify which disk it is that is causing this issue, fail it in the RAID array and replace it.

In any case, you have a bad disk and you should look to replace it ASAP.

Blog

Monday, July 11, 2016

linux - input/output error at clean ext3 partition - how to check what wrong with data block

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server