[one-users] Shutting down a VM from within the VM

Mon Nov 4 09:14:14 PST 2013

On Tue, Oct 29, 2013 at 7:04 PM, Simon Boulet <simon at nostalgeek.com> wrote:
>
> Oh, yes, I get your point. The Core uses "disappear" for setting the
> VM as UNKNOWN. I think we need to keep "disappear" as it is, or at
> least keep the current UNKNOWN behaviour. If the VM can't be monitored
> for some reason (the host is down, network issues, timeout, etc.), it
> enters UNKNOWN state and keeps monitoring the VM every interval until
> is is reported as RUNNING (or STOPPED or what ever other state
> change).

I agree, UNKNOWN should mean "I don't know what happened to it"

> What we need is a way to let the Core know that the VM was
> "successfully" monitored, but that the hypervisor reported the VM is
> not running.
>
> Have you investigated Libvirt "defined" VMs list? Libvirt maintains
> two different lists of VM: The "active" VMs and the "defined" VM. I'm
> thinking a VM that is NOT active but that is defined is a VM that was
> shutdown... If OpenNebula finds a VM is "defined" but inactive, and it
> expected the VM to be active, then it knowns the VM was unexpectedly
> shutdown (by the user from inside the VM, or by some admin accessing
> the hypervisor directly - not through OpenNebula).

You hit the nail in the head here, OpenNebula is currently using
libvirt's transient domain feature -
http://wiki.libvirt.org/page/VM_lifecycle#Transient_guest_domains_vs_Persistent_guest_domains

I think we can modify the vmm_mad to register the deploymen.x file in
libvirt before starting the VM, unregister it when the VM enters DONE
state and also move the domain around when the VM is migrated. This
will give us a clear picture of the state a VM is in, if it was
shutdown from outside opennebula the domain would remain in the
defined/stopped state. With transient domains once the VM is shut down
it completely disappears.

(I feel I've repeated almost everything Simon said, but I just wanted
to avoid a simple +1 post :)

> One thing to keep in mind as well for implementing this is when a Host
> is rebooted it may take sometime for the hypervisor to restart all
> VMs. During that time Libvirt may report a VM as "defined" but not
> "active". I am not sure if that's an issue or not, perhaps it depends
> of your hypervisor, and the order in which services are started at
> boot (are the VMs being restarted before Libvirtd is started, etc.)

There is an init script in CentOS and Debian at least that takes care
of guests restarting after a reboot (it honors the autostart setting
in the guest's domain.xml). Our domain.xmls should not have the
autostart flag set, since OpenNebula should handle host failures
according to the hooks enabled in oned.conf.
It should also check the defined templates on the host the first time
it monitors it after a failure and undefine the unneeded domains
(domains redeployed on other hosts, for example).

Looking forward for your feedback,
Andrei