Friday, January 12, 2018

Ubuntu and ZFS, loses pool on reboot



I'm using a fresh install of Ubuntu 12.04 LTS, with the ZFS PPA.



I'm finding when I create a pool it will mount and function fine, but after a reboot it shows as UNAVAIL and I can't find a way to get it back.




Here is a log of a quick test to demonstrate:



root@nas1:~# zpool status
no pools available
root@nas1:~# zpool create data /dev/disk/by-id/scsi-360019b90b24d9300174d28912b1c485d /dev/disk/by-id/scsi-360019b90b24d9300174d28a610419bec
root@nas1:~# zpool status
pool: data
state: ONLINE
scan: none requested

config:

NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
scsi-360019b90b24d9300174d28912b1c485d ONLINE 0 0 0
scsi-360019b90b24d9300174d28a610419bec ONLINE 0 0 0

errors: No known data errors
root@nas1:~# shutdown -r now


Broadcast message from root@nas1
(/dev/pts/0) at 10:41 ...

The system is going down for reboot NOW!
root@nas1:~#
login as: root
Server refused our key
root@nas1's password:
Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)


* Documentation: https://help.ubuntu.com/

System information as of Wed May 23 10:42:09 BST 2012

System load: 0.48 Users logged in: 0
Usage of /: 6.0% of 55.66GB IP address for eth0: 10.24.0.5
Memory usage: 1% IP address for eth1: 192.168.30.51
Swap usage: 0% IP address for eth2: 192.168.99.41
Processes: 142


Graph this data and manage this system at https://landscape.canonical.com/

Last login: Wed May 23 10:40:06 2012 from 192.168.100.35
root@nas1:~# zpool status
pool: data
state: UNAVAIL
status: One or more devices could not be used because the label is missing
or invalid. There are insufficient replicas for the pool to continue
functioning.
action: Destroy and re-create the pool from

a backup source.
see: http://zfsonlinux.org/msg/ZFS-8000-5E
scan: none requested
config:

NAME STATE READ WRITE CKSUM
data UNAVAIL 0 0 0 insufficient replicas
scsi-360019b90b24d9300174d28912b1c485d UNAVAIL 0 0 0
scsi-360019b90b24d9300174d28a610419bec UNAVAIL 0 0 0
root@nas1:~#



EDIT



As requested, output of ls -l /dev/disk/by-id/scsi-*:



root@nas1:~# ls -l /dev/disk/by-id/scsi-*
lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28912b1c485d -> ../../sdb
lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28a610419bec -> ../../sdc
lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28b1031dd786 -> ../../sdd

lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28baf7edd45e -> ../../sde
lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28c5ea9c6198 -> ../../sdf
lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28d1db783151 -> ../../sdg
lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28e6c0af4c8e -> ../../sdh
lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28eeb7d87669 -> ../../sdi
lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28f6ad29d90a -> ../../sdj
lrwxrwxrwx 1 root root 9 May 23 12:03 /dev/disk/by-id/scsi-360019b90b24d9300174d28fca5534028 -> ../../sdk


EDIT




I've just done some further testing. Rather than using id I tried just using sdb, sdc, etc:



zpool create data sdb sdc sdd sde


Same result. It created the pool but after a reboot it was "UNAVAIL".



EDIT




As requested, output of zdb -l /dev/sdb:



~# zdb -l /dev/sdb
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------

failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3



I did that test after creating a new pool and had the same result.



EDIT



I just tried a completely fresh install of Ubuntu 11.04 (to rule out a bug in 12.04).




  1. Added the PPA repository

  2. Did a dist-upgrade, then installed ubuntu-zfs


  3. Ran 'zpool create data sdb sdc'

  4. Checked with zpool status and the pool showed there

  5. Rebooted the server

  6. Checked again, still there.



So it's a problem with my 12.04 instance. Tempted to just reinstall...


Answer



It turned out to be a faulty RAID controller that was handling the disks. Swapped out the controller, everything works fine now!


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...