[one-users] VM life cycle - error handling

Danny Sternkopf danny.sternkopf at csc.fi
Wed Mar 28 12:33:21 PDT 2012


Hi,

the shutdown issue might be related to the double Vnet IP allocation bug:
http://dev.opennebula.org/issues/1178

As you said, the VM doesn't reach the DONE state and therefore the IP 
doesn't get released. The user has probably assumed it is gone and 
requested the IP again.

Thanks for pointing to the retry argument for IM and VMM drivers! I'll 
give it a try.

Regards,

Danny

On 2012-03-28 12:44, Carlos Martín Sánchez wrote:
> Hi Danny,
>
>
> On Thu, Mar 22, 2012 at 6:48 PM, Danny Sternkopf<danny.sternkopf at csc.fi<mailto:danny.sternkopf at csc.fi>>  wrote:
> 1) onevm shutdown fails:
> [...]
> However ONE already released the VMs IP and assigned it to another VM which of course cause a clash. I wonder if this is intended to work like this? Obviously ONE knows that the VM is still running so it should keep the associated IP allocated.
>
> The network leases and the disk images are releases once the VM reaches the DONE state only. If the shutdown timed out and the VM returned to RUNNING, this should not happen. Are you sure the OpenNebula VM is in running state? or did I misunderstand you?
>
> 2) onevm delete fails:
> It is similar to 1). virsh destroy gives an error (ExitCode: 42), but the transfer manager is wiping the disks even though the VM is still running. (but might be not fully functional anymore.) I also wonder if this makes any sense? In this case neither the user nor the administrator realize that the VM is still running unless you check the physical host locally or you take a look at the VM's log file.
>
> Yes, in this case OpenNebula assumes that the destroy action always succeeds. Unlike the graceful shutdown action, the VM is not monitored after the delete action.
>
>
> As a workaround to this erratic virsh failures, you can set a retry in the IM and VMM drivers in oned.conf, using the -r argument option [1]
>
> IM_MAD = [
>      name       = "im_kvm",
>      executable = "one_im_ssh",
>      arguments  = "-r 3 -t 15 kvm" ]
>
> VM_MAD = [
>      name       = "vmm_kvm",
>      executable = "one_vmm_exec",
>      arguments  = "-t 15 -r 3 kvm",
>      default    = "vmm_exec/vmm_exec_kvm.conf",
>      type       = "kvm" ]
>
> Regards
>
> [1] http://opennebula.org/documentation:documentation:devel-vmm
> --
> Carlos Martín, MSc
> Project Engineer
> OpenNebula - The Open-source Solution for Data Center Virtualization
> www.OpenNebula.org<http://www.OpenNebula.org>  | cmartin at opennebula.org<mailto:cmartin at opennebula.org>  | @OpenNebula<http://twitter.com/opennebula><mailto:cmartin at opennebula.org>
>
>
>
> On Thu, Mar 22, 2012 at 6:48 PM, Danny Sternkopf<danny.sternkopf at csc.fi<mailto:danny.sternkopf at csc.fi>>  wrote:
> Hi,
>
> I do encounter (very rarely as it seems) problems where VMs are not properly deleted or shut off by onevm commands. I use ONE 3.0, hosts running Fedora15 and KVM and libvirt.
>
> 1) onevm shutdown fails:
> I can see in the VM log file that the shutdown operation timed out and the VM is still running. Unfortunately I don't see the reason why 'virsh shutdown' failed. There is nothing in the system or libvirt logs. It looks for me that virsh can't properly communicate to the libvirtd. That is still harmless. However ONE already released the VMs IP and assigned it to another VM which of course cause a clash. I wonder if this is intended to work like this? Obviously ONE knows that the VM is still running so it should keep the associated IP allocated.
>
> 2) onevm delete fails:
> It is similar to 1). virsh destroy gives an error (ExitCode: 42), but the transfer manager is wiping the disks even though the VM is still running. (but might be not fully functional anymore.) I also wonder if this makes any sense? In this case neither the user nor the administrator realize that the VM is still running unless you check the physical host locally or you take a look at the VM's log file.
>
> I was not able to find out why virsh failed and could not reproduce it. The hosts are healthy, but might have a strange problem at the very moment when users requested a VM shutdown or deletion.
>
> For example I could manually run the same command ONE was executing later on and it worked. (/var/tmp/one/vmm/kvm/cancel one-20740 n020504 20740 n020504)
>
> Any hints to libvirt issue?
>
> Regards,
>
> Danny
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org<mailto:Users at lists.opennebula.org>
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>



More information about the Users mailing list