I run a SmartOS system on a Hetzner EX4S (Intel Core i7-2600, 32G RAM, 2x3Tb SATA HDD).
There are six virtual machines on the host:
[root@10-bf-48-7f-e7-03 ~]# vmadm list
UUID TYPE RAM STATE ALIAS
d2223467-bbe5-4b81-a9d1-439e9a66d43f KVM 512 running xxxx1
5f36358f-68fa-4351-b66f-830484b9a6ee KVM 1024 running xxxx2
d570e9ac-9eac-4e4f-8fda-2b1d721c8358 OS 1024 running xxxx3
ef88979e-fb7f-460c-bf56-905755e0a399 KVM 1024 running xxxx4
d8e06def-c9c9-4d17-b975-47dd4836f962 KVM 4096 running xxxx5
4b06fe88-db6e-4cf3-aadd-e1006ada7188 KVM 9216 running xxxx5
[root@10-bf-48-7f-e7-03 ~]#
The host reboots several times a week with no crash dump in /var/crash
and no messages in the /var/adm/messages
log.
Basically /var/adm/messages
looks like there was a hard reset:
2012-11-23T08:54:43.210625+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:14:43.187589+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:34:43.165100+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:54:43.142065+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:14:43.119365+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:34:43.096351+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:54:43.073821+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:57:55.610954+00:00 10-bf-48-7f-e7-03 genunix: [ID 540533 kern.notice] #015SunOS Release 5.11 Version joyent_20121018T224723Z 64-bit
2012-11-23T10:57:55.610962+00:00 10-bf-48-7f-e7-03 genunix: [ID 299592 kern.notice] Copyright (c) 2010-2012, Joyent Inc. All rights reserved.
2012-11-23T10:57:55.610967+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: lgpg
2012-11-23T10:57:55.610971+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: tsc
2012-11-23T10:57:55.610974+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: msr
2012-11-23T10:57:55.610978+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mtrr
2012-11-23T10:57:55.610981+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pge
2012-11-23T10:57:55.610984+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: de
2012-11-23T10:57:55.610987+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cmov
2012-11-23T10:57:55.610995+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mmx
2012-11-23T10:57:55.611000+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mca
2012-11-23T10:57:55.611004+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pae
2012-11-23T10:57:55.611008+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cv8
The problem is that sometimes the host loses the network interface on reboot so we need to perform a manual hardware reset to bring it back.
We do not have physical or virtual access to the server console - no KVM, no iLO or anything like this. So, the only way to debug is to analyze crash dumps/log files.
I am not a SmartOS/Solaris expert so I am not sure how to proceed. Is there any equivalent of Linux netconsole for SmartOS? Can I just redirect the console output to the network port somehow? Maybe I am missing something obvious and crash information is located somewhere else.
Answer
Run the command dumpadm
to check crash dumps are enabled, and on what device.
If it is enabled and you find no crash dumps, then suspect a hardware fault and ask your hosting company to move you to a different physical server. (They will also be able to check hardware logs and fault lights and call the vendor and so on.)
No comments:
Post a Comment