[one-users] Multiple guests running after failed cleanup

Matthew Richardson m.richardson at ed.ac.uk
Thu Jan 9 08:39:35 PST 2014


Hi,

I'm running a ONE 4.2 pool, and had some issues with it earlier today.

I had some vm hosts lock up due to networking issues, where the vm hosts
could see the rest of the world, but not be reached by the ONE server.

As a result, the ONE server called a hook script:

VM_HOOK = [ name = "on_crash_boot", on = "UNKNOWN", command =
"/usr/bin/env onevm boot", arguments = "$ID" ]

This resulted in an attempted cleanup (which appears to fail due to the
ongoing network problems) followed by a restart elsewhere.  However, the
failed cleanup meant that I then had 2 instances of the same guest
running on 2 vm hosts, which led to mac address conflicts on the network.

Is this a bug in ONE's handling of cleanup failure, or is there
something else I should be doing in my hook script to ensure that it is
safe to call onevm boot?

Any advice appreciated! (other than to take better care of the network :) )

thanks,

Matthew


oned.log starts as follows:

Thu Jan  9 08:13:07 2014 [InM][I]: Command execution fail: 'if [ -x
"/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm 2
vmhost3; else                              exit 42; fi'
Thu Jan  9 08:13:07 2014 [InM][I]: Connection closed by 192.168.12.16
Thu Jan  9 08:13:07 2014 [InM][I]: ExitCode: 255
Thu Jan  9 08:13:07 2014 [ONE][E]: Error monitoring Host vmhost3 (2): -
Thu Jan  9 08:13:07 2014 [ReM][D]: Req:3296 UID:0 VirtualMachineAction
invoked, "boot", 14
Thu Jan  9 08:13:07 2014 [DiM][D]: Restarting VM 14
Thu Jan  9 08:13:07 2014 [ReM][D]: Req:3296 UID:0 VirtualMachineAction
result SUCCESS, 14
Thu Jan  9 08:13:07 2014 [HKM][D]: Message received: EXECUTE SUCCESS 14
on_crash_boot:

Thu Jan  9 08:13:08 2014 [ReM][D]: Req:3328 UID:0 VirtualMachineInfo
invoked, 14
Thu Jan  9 08:13:08 2014 [ReM][D]: Req:3328 UID:0 VirtualMachineInfo
result SUCCESS, "<VM><ID>14</ID><UID>..."

Thu Jan  9 08:13:08 2014 [ReM][D]: Req:9328 UID:0 VirtualMachineAction
invoked, "delete-recreate", 14
Thu Jan  9 08:13:08 2014 [ReM][D]: Req:9328 UID:0 VirtualMachineAction
result SUCCESS, 14

Thu Jan  9 08:13:08 2014 [VMM][D]: Message received: LOG I 14 Driver
command for 14 cancelled



The (slightly redacted) guest log (14.log) is as follows:

Thu Jan  9 07:44:53 2014 [LCM][I]: New VM state is RUNNING
Thu Jan  9 08:13:07 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan  9 08:13:07 2014 [LCM][I]: New VM state is BOOT_UNKNOWN
Thu Jan  9 08:13:07 2014 [HKM][I]: Success executing Hook: on_crash_boot: .
Thu Jan  9 08:13:07 2014 [VMM][I]: Generating deployment file:
/var/lib/one/vms/14/deployment.4917
Thu Jan  9 08:13:08 2014 [LCM][I]: New VM state is CLEANUP.
Thu Jan  9 08:13:08 2014 [VMM][I]: Driver command for 14 cancelled
Thu Jan  9 08:18:52 2014 [VMM][I]: Command execution fail:
/var/tmp/one/vmm/kvm/cancel 'one-14' 'vmhost3' 14 vmhost3
Thu Jan  9 08:18:52 2014 [VMM][I]: Connection closed by 192.168.12.16
Thu Jan  9 08:18:52 2014 [VMM][I]: ExitSSHCode: 255
Thu Jan  9 08:18:52 2014 [VMM][E]: Error connecting to vmhost3
Thu Jan  9 08:18:52 2014 [VMM][I]: Failed to execute virtualization
driver operation: cancel.
Thu Jan  9 08:18:52 2014 [VMM][I]: Command execution fail:
/var/tmp/one/vnm/dummy/clean <...snip...>
Thu Jan  9 08:18:52 2014 [VMM][I]: Connection closed by 192.168.12.16
Thu Jan  9 08:18:52 2014 [VMM][I]: ExitSSHCode: 255
Thu Jan  9 08:18:52 2014 [VMM][E]: Error connecting to vmhost3
Thu Jan  9 08:18:52 2014 [VMM][I]: Failed to execute network driver
operation: clean.
Thu Jan  9 08:19:01 2014 [VMM][I]: Successfully execute transfer manager
driver operation: tm_delete.
Thu Jan  9 08:19:02 2014 [VMM][I]: Successfully execute transfer manager
driver operation: tm_delete.
Thu Jan  9 08:19:02 2014 [VMM][I]: Host successfully cleaned.
Thu Jan  9 08:19:03 2014 [DiM][I]: New VM state is PENDING
Thu Jan  9 08:20:54 2014 [DiM][I]: New VM state is ACTIVE.
Thu Jan  9 08:20:54 2014 [LCM][I]: New VM state is PROLOG.
Thu Jan  9 08:20:54 2014 [VM][I]: Virtual Machine has no context
Thu Jan  9 08:20:54 2014 [LCM][I]: New VM state is BOOT
Thu Jan  9 08:20:54 2014 [VMM][I]: Generating deployment file:
/var/lib/one/vms/14/deployment.4918
Thu Jan  9 08:20:56 2014 [VMM][I]: ExitCode: 0
Thu Jan  9 08:20:56 2014 [VMM][I]: Successfully execute network driver
operation: pre.
Thu Jan  9 08:20:56 2014 [VMM][I]: ExitCode: 0
Thu Jan  9 08:20:56 2014 [VMM][I]: Successfully execute virtualization
driver operation: deploy.
Thu Jan  9 08:20:56 2014 [VMM][I]: ExitCode: 0
Thu Jan  9 08:20:56 2014 [VMM][I]: Successfully execute network driver
operation: post.
Thu Jan  9 08:20:56 2014 [LCM][I]: New VM state is RUNNING



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


More information about the Users mailing list