Dear all,<div>one of the hosts in my cluster is reported to be in state ERROR.</div><div>I've looked through its sys log messages, and I found this:</div><div><br></div><div><div>[ 1.847465] WARNING: at /build/buildd/linux-3.2.0/kernel/watchdog.c:241 watchdog_overflow_callback+0x9a/0xc0()</div>
<div>[ 1.847465] Hardware name: SE1102 </div><div>[ 1.847465] Watchdog detected hard LOCKUP on cpu 7</div><div>[ 1.847465] Modules linked in:</div><div>[ 1.847465] Pid: 1, comm: swapper/0 Not tainted 3.2.0-27-generic #43-Ubuntu</div>
<div>[ 1.847465] Call Trace:</div><div>[ 1.847465] <NMI> [<ffffffff8106729f>] warn_slowpath_common+0x7f/0xc0</div><div>[ 1.847465] [<ffffffff81067396>] warn_slowpath_fmt+0x46/0x50</div><div>
[ 1.847465] [<ffffffff810d837a>] watchdog_overflow_callback+0x9a/0xc0</div>
<div>[ 1.847465] [<ffffffff81111bb6>] __perf_event_overflow+0x96/0x1e0</div><div>[ 1.847465] [<ffffffff8110f141>] ? perf_event_update_userpage+0x11/0xc0</div><div>[ 1.847465] [<ffffffff81023aaa>] ? x86_perf_event_set_period+0xda/0x150</div>
<div>[ 1.847465] [<ffffffff81112104>] perf_event_overflow+0x14/0x20</div><div>[ 1.847465] [<ffffffff810281c3>] intel_pmu_handle_irq+0x163/0x210</div><div>[ 1.847465] [<ffffffff8139c350>] ? ghes_read_estatus+0x90/0x180</div>
<div>[ 1.847465] [<ffffffff8165b7c1>] perf_event_nmi_handler+0x21/0x30</div><div>[ 1.847465] [<ffffffff8165b089>] default_do_nmi+0x69/0x220</div><div>[ 1.847465] [<ffffffff8165b2c0>] do_nmi+0x80/0x90</div>
<div>[ 1.847465] [<ffffffff8165a6b0>] nmi+0x20/0x30</div><div>[ 1.847465] [<ffffffff814c84c2>] ? __i8042_command.part.1+0xd2/0x200</div><div>[ 1.847465] <<EOE>> [<ffffffff814c8630>] __i8042_command+0x40/0x60</div>
<div>[ 1.847465] [<ffffffff814c8689>] i8042_command+0x39/0x60</div><div>[ 1.847465] [<ffffffff81d3447e>] i8042_check_aux+0x24/0x201</div><div>[ 1.847465] [<ffffffff81d34cb1>] i8042_setup_aux+0x16/0x12e</div>
<div>[ 1.847465] [<ffffffff81d34dee>] i8042_probe.part.10+0x25/0xbd</div><div>[ 1.847465] [<ffffffff81d34eb1>] i8042_probe+0x2b/0x2d</div><div>[ 1.847465] [<ffffffff813f6e57>] platform_drv_probe+0x17/0x20</div>
<div>[ 1.847465] [<ffffffff813f56b8>] really_probe+0x68/0x190</div><div>[ 1.847465] [<ffffffff813f5945>] driver_probe_device+0x45/0x70</div><div>[ 1.847465] [<ffffffff813f5a1b>] __driver_attach+0xab/0xb0</div>
<div>[ 1.847465] [<ffffffff813f5970>] ? driver_probe_device+0x70/0x70</div><div>[ 1.847465] [<ffffffff813f5970>] ? driver_probe_device+0x70/0x70</div><div>[ 1.847465] [<ffffffff813f47ac>] bus_for_each_dev+0x5c/0x90</div>
<div>[ 1.847465] [<ffffffff813f547e>] driver_attach+0x1e/0x20</div><div>[ 1.847465] [<ffffffff813f50d0>] bus_add_driver+0x1a0/0x270</div><div>[ 1.847465] [<ffffffff813f5f86>] driver_register+0x76/0x140</div>
<div>[ 1.847465] [<ffffffff813f7416>] platform_driver_register+0x46/0x50</div><div>[ 1.847465] [<ffffffff813f7448>] platform_driver_probe+0x28/0xb0</div><div>[ 1.847465] [<ffffffff813f7be1>] platform_create_bundle+0xc1/0xf0</div>
<div>[ 1.847465] [<ffffffff81d34e86>] ? i8042_probe.part.10+0xbd/0xbd</div><div>[ 1.847465] [<ffffffff81d349eb>] ? i8042_platform_init+0xb1/0xb1</div><div>[ 1.847465] [<ffffffff81d34a47>] i8042_init+0x5c/0x80</div>
<div>[ 1.847465] [<ffffffff81002040>] do_one_initcall+0x40/0x180</div><div>[ 1.847465] [<ffffffff81cfbce9>] kernel_init+0xd9/0x158</div><div>[ 1.847465] [<ffffffff816643f4>] kernel_thread_helper+0x4/0x10</div>
<div>[ 1.847465] [<ffffffff81cfbc10>] ? start_kernel+0x3bd/0x3bd</div><div>[ 1.847465] [<ffffffff816643f0>] ? gs_change+0x13/0x13</div><div>[ 1.847465] ---[ end trace ffc3668eca9f076b ]---</div></div>
<div>...</div><div><div>[ 60.721539] init: failsafe main process (734) killed by TERM signal</div><div>[ 60.756219] type=1400 audit(1343661831.840:8): apparmor="STATUS" operation="profile_replace" name="/sbin/dhclient" pid=877 comm="apparmor_parser"</div>
<div>[ 60.756574] type=1400 audit(1343661831.840:9): apparmor="STATUS" operation="profile_replace" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=877 comm="apparmor_parser"</div>
<div>[ 60.756777] type=1400 audit(1343661831.840:10): apparmor="STATUS" operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" pid=877 comm="apparmor_parser"</div>
<div>[ 60.762495] type=1400 audit(1343661831.844:11): apparmor="STATUS" operation="profile_load" name="/usr/sbin/libvirtd" pid=879 comm="apparmor_parser"</div><div>[ 60.772471] type=1400 audit(1343661831.856:12): apparmor="STATUS" operation="profile_load" name="/usr/sbin/tcpdump" pid=881 comm="apparmor_parser"</div>
<div>[ 60.772800] type=1400 audit(1343661831.856:13): apparmor="STATUS" operation="profile_load" name="/usr/lib/libvirt/virt-aa-helper" pid=878 comm="apparmor_parser"</div><div>[ 60.837322] init: libvirt-bin main process (957) terminated with status 6</div>
<div>[ 60.837345] init: libvirt-bin main process ended, respawning</div><div>[ 60.850389] init: libvirt-bin main process (973) terminated with status 6</div><div>[ 60.850411] init: libvirt-bin main process ended, respawning</div>
<div>[ 60.863761] init: libvirt-bin main process (989) terminated with status 6</div><div>[ 60.863783] init: libvirt-bin main process ended, respawning</div><div>[ 60.890572] init: libvirt-bin main process (1022) terminated with status 6</div>
<div>[ 60.890597] init: libvirt-bin main process ended, respawning</div><div>[ 60.904526] init: libvirt-bin main process (1042) terminated with status 6</div><div>[ 60.904550] init: libvirt-bin main process ended, respawning</div>
<div>[ 60.917624] init: libvirt-bin main process (1058) terminated with status 6</div><div>[ 60.917647] init: libvirt-bin main process ended, respawning</div><div>[ 60.931100] init: libvirt-bin main process (1074) terminated with status 6</div>
<div>[ 60.931123] init: libvirt-bin main process ended, respawning</div><div>[ 60.944178] init: libvirt-bin main process (1090) terminated with status 6</div><div>[ 60.944200] init: libvirt-bin main process ended, respawning</div>
<div>[ 60.963068] init: libvirt-bin main process (1106) terminated with status 6</div><div>[ 60.963101] init: libvirt-bin main process ended, respawning</div><div>[ 60.975641] init: libvirt-bin main process (1122) terminated with status 6</div>
<div>[ 60.975674] init: libvirt-bin main process ended, respawning</div><div>[ 60.988176] init: libvirt-bin main process (1138) terminated with status 6</div><div>[ 60.988203] init: libvirt-bin respawning too fast, stopped</div>
</div><div><br></div><div><br></div><div>Do you believe the first WARNING message is responsible for the libvirt error somehow?</div><div>The machine runs Ubuntu 12.04 LTS, 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux</div>
<div><br></div><div>Any suggestion is highly appreciated.</div><div>Best regards,<br>Valerio</div>