[one-users] Shutting down a VM from within the VM

Fri Nov 8 02:49:55 PST 2013

Hi Simon,

On Tue, Oct 29, 2013 at 6:04 PM, Simon Boulet <simon at nostalgeek.com> wrote:
>
> > Rubén could not retrieve that 'paused' state from libvirt, no matter how
> the
> > vm was destroyed, he always got 'stopped'. Are we missing something?
>
> It depends of the Libvirt backend you're using and how it detects the
> state change. The paused state in libvirt is supposed to be reported
> when the VM is paused (and it's state, memory, etc. preserved for
> being resumed later). You need to trick the hypervisor in thinking the
> VM has been paused when the shutdown is initiated from inside the VM.
> It's a hack, it wont work out of the box with the stock libvirt
> backends.
>

Oh, ok, thanks for clearing that up.

> Generally I think the Core should be more lightweight and make better
> use of external drivers, hooks, etc. limiting the Core to state
> change, consistency, scheduling events, etc. Spreading out the
> workflow / drivers has much as possible makes it much more easier to
> customize OpenNebula to each environments. Also keeping the Core
> lightweight makes it a lot much easier to maintain and optimize.
> That's why I'm generally in favour or trying to implement as much as
> we can outside from the Core, when it's possible.

I totally agree, that's one of the big advantages of opennebula: everything
that interacts with external components is done via drivers. But in this
case I'm not so sure there is any advantage to the hook approach.

When the functionality will vary depending on the underlying
components, it's clearly something that must be done with a new driver
action.
For this feature, if it is setup in oned.conf or a hook, both will behave
in the same way:
A default transition to one of done, poweroff or undeployed; and a VM
attribute to override this for each VM.

There's another reason I'm not in favor of using hooks for any important
feature. Compared to driver actions, they are executed asynchronously, and
the core cannot know if the execution failed or not, we cannot put timeouts
or retries in place, etc.

> What we need is a way to let the Core know that the VM was
> "successfully" monitored, but that the hypervisor reported the VM is
> not running.
>
>
Off the top of my head, I believe we are already doing this. A successful
monitorization of a gone VM should be reported as a POLL SUCCESS,
STATE='-'.

> Have you investigated Libvirt "defined" VMs list? Libvirt maintains
> two different lists of VM: The "active" VMs and the "defined" VM. I'm
> thinking a VM that is NOT active but that is defined is a VM that was
> shutdown... If OpenNebula finds a VM is "defined" but inactive, and it
> expected the VM to be active, then it knowns the VM was unexpectedly
> shutdown (by the user from inside the VM, or by some admin accessing
> the hypervisor directly - not through OpenNebula).
>
>
I know there was a reason against this in the first OpenNebula versions,
I'll try to ask other team members about this. My guess is that it would
break the management consistency between kvm and xen, since we don't use
libvirt for xen VMs.

> One thing to keep in mind as well for implementing this is when a Host
> is rebooted it may take sometime for the hypervisor to restart all
> VMs. During that time Libvirt may report a VM as "defined" but not
> "active". I am not sure if that's an issue or not, perhaps it depends
> of your hypervisor, and the order in which services are started at
> boot (are the VMs being restarted before Libvirtd is started, etc.)
>

One scenario where I see this being problematic is if the fault tolerance
hook has already re-deployed the VM in another host. I guess this should be
something configurable that the admin can disable.

Regards,
Carlos

--
Carlos Martín, MSc
Project Engineer
OpenNebula - Flexible Enterprise Cloud Made Simple
www.OpenNebula.org | cmartin at opennebula.org |
@OpenNebula<http://twitter.com/opennebula><cmartin at opennebula.org>

On Tue, Oct 29, 2013 at 6:04 PM, Simon Boulet <simon at nostalgeek.com> wrote:

> On Tue, Oct 29, 2013 at 12:26 PM, Carlos Martín Sánchez
> <cmartin at opennebula.org> wrote:
> > Hi,
> >
> > On Tue, Oct 29, 2013 at 4:43 PM, Simon Boulet <simon at nostalgeek.com>
> wrote:
> >>
> >> The libvirt "paused" method I
> >> suggested is a hack that works with OpenNebula and turns the VM that
> >> are internally shutdown to "SUSPENDED" in OpenNebula.
> >
> >
> > Rubén could not retrieve that 'paused' state from libvirt, no matter how
> the
> > vm was destroyed, he always got 'stopped'. Are we missing something?
>
> It depends of the Libvirt backend you're using and how it detects the
> state change. The paused state in libvirt is supposed to be reported
> when the VM is paused (and it's state, memory, etc. preserved for
> being resumed later). You need to trick the hypervisor in thinking the
> VM has been paused when the shutdown is initiated from inside the VM.
> It's a hack, it wont work out of the box with the stock libvirt
> backends.
>
> >
> >> One comment though, perhaps the extra attribute in the VM template
> >> could be managed outside the core, and have this managed by a hook.
> >> Ex. if someone wanted to have the Amazon
> >> "instance-initiated-shutdown-behavior":
> >>
> >>
> >>
> >> - Set the oned defaut when a VM disappears to POWEROFF.
> >> - Have a state change hooks that picks up the POWEROFF state change,
> >> parse the VM template to see if an INITIATED_SHUTDOWN_BEHAVIOR user
> >> attribute is set. If so, parse the attribute, if it's set to ex.
> >> TERMINATE, cancel / delete the VM.
> >
> >
> > I don't see any advantage to this, honestly.
>
>
> Generally I think the Core should be more lightweight and make better
> use of external drivers, hooks, etc. limiting the Core to state
> change, consistency, scheduling events, etc. Spreading out the
> workflow / drivers has much as possible makes it much more easier to
> customize OpenNebula to each environments. Also keeping the Core
> lightweight makes it a lot much easier to maintain and optimize.
> That's why I'm generally in favour or trying to implement as much as
> we can outside from the Core, when it's possible.
>
> > If you set the default
> > behaviour to DONE, you can't undo that with a hook and set the VM back to
> > poweroff...
>
>
> Yes, of course, it wouldn't work with a default to DONE because once
> the VM has entered in DONE state it can't be recovered. But it would
> work for other defaults for example POWEROFF state can be resumed
> (although VM in POWEROFF can't be cancelled, it can only be
> deleted....)
>
>
> > Plus I think it's much safer to do it in the core. For example, when a
> Host
> > returns a monitor failure, all the VMs are set to UNKNOWN. But this
> doesn't
> > mean that the VM disappeared from the hypervisor, just that the VM could
> not
> > be monitored.
> >
>
>
> Oh, yes, I get your point. The Core uses "disappear" for setting the
> VM as UNKNOWN. I think we need to keep "disappear" as it is, or at
> least keep the current UNKNOWN behaviour. If the VM can't be monitored
> for some reason (the host is down, network issues, timeout, etc.), it
> enters UNKNOWN state and keeps monitoring the VM every interval until
> is is reported as RUNNING (or STOPPED or what ever other state
> change).
>
> What we need is a way to let the Core know that the VM was
> "successfully" monitored, but that the hypervisor reported the VM is
> not running.
>
> Have you investigated Libvirt "defined" VMs list? Libvirt maintains
> two different lists of VM: The "active" VMs and the "defined" VM. I'm
> thinking a VM that is NOT active but that is defined is a VM that was
> shutdown... If OpenNebula finds a VM is "defined" but inactive, and it
> expected the VM to be active, then it knowns the VM was unexpectedly
> shutdown (by the user from inside the VM, or by some admin accessing
> the hypervisor directly - not through OpenNebula).
>
> One thing to keep in mind as well for implementing this is when a Host
> is rebooted it may take sometime for the hypervisor to restart all
> VMs. During that time Libvirt may report a VM as "defined" but not
> "active". I am not sure if that's an issue or not, perhaps it depends
> of your hypervisor, and the order in which services are started at
> boot (are the VMs being restarted before Libvirtd is started, etc.)
>
> Simon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20131108/0d2bffe9/attachment-0002.htm>