Sunday, January 17, 2016

ESXi 4.1 no longer detects iSCSI datastore

Long story short, we had a drive failure on a RAID5 iSCSI target. We replaced the drive and it successfully rebuilt. Unfortunately after this was done, ESXi lost the datastore. It still shows up in the iSCSI Software Adapter, but no correlating datastore. I've searched everywhere trying to fix the issue.



From the ESXi console:




esxcfg-scsidevs -c

Result:
Device UID Device Type Console Device Size Multipath PluginDisplay Name
t10.9454450000000000C5544A6D0391F67FD6598A8260293F4F Direct-Access /vmfs/devices/disks/t10.9454450000000000C5544A6D0391F67FD6598A8260293F4F 1878417MB NMP svndirect


There are two other datastores, but I'm not concerned with those (they are local and work). The one above is the problem.




fdisk -l /vmfs/devices/disks/t10.9454450000000000C5544A6D0391F67FD6598A8260293F4F

Result:
Disk /vmfs/devices/disks/t10.9454450000000000C5544A6D0391F67FD6598A8260293F4F: 1969.6 GB, 1969663770624 bytes
255 heads, 63 sectors/track, 239464 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


Device Boot Start End Blocks Id System
/vmfs/devices/disks/t10.9454450000000000C5544A6D0391F67FD6598A8260293F4Fp1 1 239464 1923494516 fb VMFS



It says that it's VMFS, so the partition isn't missing. When I attempt to delete and re-add the iSCSI, the following gets logged into /var/log/messages:



May  9 16:35:58 Hostd: [2012-05-09 16:35:58.606 FFD97E90 info 'TaskManager' opID=4047693D-000001C2] Task Created : haTask-ha-host-vim.host.StorageSystem.addInternetScsiStaticTargets-871
May 9 16:35:58 Hostd: [2012-05-09 16:35:58.629 47EF3B90 verbose 'StorageSystem' opID=4047693D-000001C2] SendStorageInfoEvent() called
May 9 16:35:58 Hostd: [2012-05-09 16:35:58.630 47EF3B90 info 'TaskManager' opID=4047693D-000001C2] Task Completed : haTask-ha-host-vim.host.StorageSystem.addInternetScsiStaticTargets-871 Status success
May 9 16:35:58 Hostd: [2012-05-09 16:35:58.643 47CE2B90 info 'App'] CreateISCSIHBA
May 9 16:35:58 vmkernel: 0:04:46:41.850 cpu14:4685)NMP: nmp_CompleteCommandForPath: Command 0x12 (0x41027f9e6040) to NMP device "mpx.vmhba0:C0:T0:L0" failed on physical path "vmhba0:C0:T0:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
May 9 16:35:58 vmkernel: 0:04:46:41.850 cpu14:4685)ScsiDeviceIO: 1672: Command 0x12 to device "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

May 9 16:35:58 Hostd: [2012-05-09 16:35:58.743 47CE2B90 error 'App'] Unable to build Durable Name dependent properties: Unable to query VPD pages from device, all paths are dead, no I/O possible.
May 9 16:35:58 Hostd: [2012-05-09 16:35:58.794 47CE2B90 verbose 'App'] Looking up object with name = "firewallSystem" failed.
May 9 16:35:58 Hostd: [2012-05-09 16:35:58.857 47CE2B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
May 9 16:35:59 Hostd: [2012-05-09 16:35:59.492 47CE2B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
May 9 16:36:00 Hostd: [2012-05-09 16:36:00.072 47CE2B90 warning 'PropertyCollector'] ComputeGUReq took 1442151 microSec
May 9 16:36:02 Hostd: [2012-05-09 16:36:02.910 47CE2B90 info 'TaskManager' opID=4047693D-000001CC] Task Created : haTask-ha-host-vim.host.StorageSystem.rescanHba-872
May 9 16:36:02 iscsid: Login Target: iqn.2004-08.jp.buffalo:TS-RIXL6F4-0024A525A6F4:svndirect if=default addr=192.168.100.2:3260 (TPGT:65536 ISID:0x1)
May 9 16:36:02 iscsid: Notice: Assigned (H33 T0 C0 session=14, target=1/1)
May 9 16:36:02 iscsid: DISCOVERY: Pending=1 Failed=0
May 9 16:36:03 vmkernel: 0:04:46:46.282 cpu9:4812)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4100a4317810 network resource pool netsched.pools.persist.iscsi associated

May 9 16:36:03 iscsid: Portal group tag mismatch, expected 65536, received 1
May 9 16:36:03 vmkernel: 0:04:46:46.554 cpu9:4812)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba33:CH:0 T:0 CN:0: iSCSI connection is being marked "ONLINE"
May 9 16:36:03 iscsid: Login Success: iqn.2004-08.jp.buffalo:TS-RIXL6F4-0024A525A6F4:svndirect if=default addr=192.168.100.2:3260 (TPGT:1 ISID:0x1)
May 9 16:36:03 vmkernel: 0:04:46:46.554 cpu9:4812)WARNING: iscsi_vmk: iscsivmk_StartConnection: Sess [ISID: 00023d000001 TARGET: iqn.2004-08.jp.buffalo:TS-RIXL6F4-0024A525A6F4:svndirect TPGT: 1 TSIH: 0]
May 9 16:36:03 vmkernel: 0:04:46:46.554 cpu9:4812)WARNING: iscsi_vmk: iscsivmk_StartConnection: Conn [CID: 0 L: 192.168.100.8:53668 R: 192.168.100.2:3260]
May 9 16:36:03 vmkernel: 0:04:46:46.554 cpu11:4134)WARNING: ScsiCore: 1353: Power-on Reset occurred on vmhba33:C0:T0:L0
May 9 16:36:03 vobd: May 09 16:36:03.447: 17206554467us: [vob.scsi.scsipath.por] Power-on Reset occurred on vmhba33:C0:T0:L0.
May 9 16:36:03 vmkernel: 0:04:46:46.555 cpu11:4134)vmw_psp_fixed: psp_fixedSelectPathToActivateInt: Changing active path from NONE to vmhba33:C0:T0:L0 for device "t10.9454450000000000C5544A6D0391F67FD6598A8260293F4F".
May 9 16:36:03 vobd: May 09 16:36:03.448: 17206555428us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba33:C0:T0:L0 changed state from dead.
May 9 16:36:03 vobd: May 09 16:36:03.448: 17207216399us: [esx.clear.storage.connectivity.restored] Connectivity to storage device t10.9454450000000000C5544A6D0391F67FD6598A8260293F4F (Datastores: Unknown) restored. Path vmhba33:C0:T0:L0 is active aga

May 9 16:36:03 in..
May 9 16:36:03 iscsid: connection 1:0 (iqn.2004-08.jp.buffalo:TS-RIXL6F4-0024A525A6F4:svndirect if=default addr=192.168.100.2:3260 (TPGT:1 ISID:0x1) (T0 C0)) is operational
May 9 16:36:03 Hostd: [2012-05-09 16:36:03.449 48840B90 info 'ha-eventmgr'] Event 94 : Connectivity to storage device t10.9454450000000000C5544A6D0391F67FD6598A8260293F4F (Datastores: Unknown) restored. Path vmhba33:C0:T0:L0 is active again.
May 9 16:36:03 Hostd: [2012-05-09 16:36:03.449 47C60B90 verbose 'Hostsvc::DatastoreSystem'] VmfsVobUpdate: got Vob message [N11HostdCommon6VobMsgE:0x48d1cbb0] timestamp=17207216885
May 9 16:36:03 Hostd: [2012-05-09 16:36:03.449 47C60B90 verbose 'Hostsvc::DatastoreSystem'] VmfsVobUpdate: Datastore Unknown not found
May 9 16:36:03 iscsid: DISCOVERY: Pending=0 Failed=0
May 9 16:36:03 vmkernel: 0:04:46:47.055 cpu1:8900)ScsiScan: 1059: Path 'vmhba33:C0:T0:L0': Vendor: 'IET ' Model: 'VIRTUAL-DISK ' Rev: '0 '
May 9 16:36:03 vmkernel: 0:04:46:47.055 cpu1:8900)ScsiScan: 1062: Path 'vmhba33:C0:T0:L0': Type: 0x0, ANSI rev: 4, TPGS: 0 (none)
May 9 16:36:03 Hostd: [2012-05-09 16:36:03.949 48918B90 verbose 'StorageSystem' opID=4047693D-000001CC] SendStorageInfoEvent() called
May 9 16:36:03 Hostd: [2012-05-09 16:36:03.949 48881B90 verbose 'StorageSystem'] StoragePathUpdate: Invalidate Path Cache immediately

May 9 16:36:03 Hostd: [2012-05-09 16:36:03.949 48881B90 verbose 'StorageSystem'] SendStorageInfoEvent() called
May 9 16:36:03 Hostd: [2012-05-09 16:36:03.949 48918B90 verbose 'Hostsvc::DatastoreSystem' opID=4047693D-000001CC] ReconcileVMFSDatastores called: refresh = true, rescan = false
May 9 16:36:03 Hostd: [2012-05-09 16:36:03.949 48918B90 verbose 'FSVolumeProvider' opID=4047693D-000001CC] RefreshVMFSVolumes called
May 9 16:36:04 Hostd: [2012-05-09 16:36:04.047 48918B90 verbose 'FSVolumeProvider' opID=4047693D-000001CC] RefreshVMFSVolumes: refreshed volume, id 4c99efca-ac76eb4e-16a1-00219ba3e871, name datastore1
May 9 16:36:04 Hostd: [2012-05-09 16:36:04.128 48918B90 verbose 'FSVolumeProvider' opID=4047693D-000001CC] RefreshVMFSVolumes: refreshed volume, id 4c99efd0-316cd952-d364-00219ba3e871, name datastore2
May 9 16:36:04 Hostd: [2012-05-09 16:36:04.128 48918B90 verbose 'Hostsvc::Datastore' opID=4047693D-000001CC] SetVolume: Datastore 4c99efca-ac76eb4e-16a1-00219ba3e871 has changed provider volume pointer
May 9 16:36:04 Hostd: [2012-05-09 16:36:04.128 48918B90 verbose 'Hostsvc::Datastore' opID=4047693D-000001CC] SetVolume: Datastore 4c99efd0-316cd952-d364-00219ba3e871 has changed provider volume pointer
May 9 16:36:04 Hostd: [2012-05-09 16:36:04.128 48918B90 verbose 'Hostsvc::DatastoreSystem' opID=4047693D-000001CC] ReconcileVMFSDatastores: Done discovering new filesystem volumes.
May 9 16:36:04 Hostd: [2012-05-09 16:36:04.128 48918B90 info 'TaskManager' opID=4047693D-000001CC] Task Completed : haTask-ha-host-vim.host.StorageSystem.rescanHba-872 Status success
May 9 16:36:04 Hostd: [2012-05-09 16:36:04.143 47C60B90 info 'App'] CreateISCSIHBA

May 9 16:36:04 vmkernel: 0:04:46:47.341 cpu12:4685)NMP: nmp_CompleteCommandForPath: Command 0x12 (0x4102bf1f8440) to NMP device "mpx.vmhba0:C0:T0:L0" failed on physical path "vmhba0:C0:T0:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
May 9 16:36:04 vmkernel: 0:04:46:47.341 cpu12:4685)ScsiDeviceIO: 1672: Command 0x12 to device "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
May 9 16:36:08 Hostd: [2012-05-09 16:36:08.591 48840B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
May 9 16:36:08 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:08 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:08 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:08 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0

May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0

May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0
May 9 16:36:09 sfcb-vmware_base[6314]: storelib Physical Device Device ID : 0x0


I'm particularly concerned with these lines:




May  9 16:36:04 Hostd: [2012-05-09 16:36:04.128 48918B90 verbose 'Hostsvc::Datastore' opID=4047693D-000001CC] SetVolume: Datastore 4c99efca-ac76eb4e-16a1-00219ba3e871 has changed provider volume pointer
May 9 16:36:04 Hostd: [2012-05-09 16:36:04.128 48918B90 verbose 'Hostsvc::Datastore' opID=4047693D-000001CC] SetVolume: Datastore 4c99efd0-316cd952-d364-00219ba3e871 has changed provider volume pointer


This appears to be my iSCSI datastore that was detected, but it doesn't work? I'm not sure.



I even attempted to connect to the iSCSI myself and use the VMFS tools available in Ubuntu to try and look at the data, but it would always crash whenever I even attempted to pull info or list files. I even tried it with the VMFS Java app which also failed. Perhaps the VMFS is corrupted? I couldn't find a way to attempt a repair on the device, because the tools VMware gives you appear to only work on volumes (and it does not show up as a VMFS volume).



The LUN does show up when you add, but it says it wants to format and erase it.

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...