[one-users] VMM ignores errors when failing to delete a VM

Ruben S. Montero rsmontero at opennebula.org
Mon Feb 25 05:23:16 PST 2013


Hi

There are some issues in 3.2 that have been solved through 3.8 and in next
4.0:

1.- We've improved the checks and synchronization of delete operation, so
it will wait for the cancel operations.

2.- Delete is an admin operation you should be using cancel to dispose VMs
(it performs additional checks). Think of delete as kill -9, the result of
some actions are indeed not checked in this mode (see below).

3.- In my experience libvirt is not very good at doing several things at
the same time... You may try to use the 4.0 drivers that retry the
operation, e.g.

https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/kvm/cancel

4.- Monitoring system has been improved to discover zombie vms, you'll be
notified each monitoring step, so proper automatic or manual corrective
actions could be triggered.

Cheers

Ruben




On Fri, Feb 22, 2013 at 6:52 PM, Gerard Bernabeu <gerard1 at fnal.gov> wrote:

>  Hi,
>
> during a bulk VM deletion process 3 out of 9 VMs failed to be actually
> deleted from the hypervisor host (they're still running). This is obvious
> by comparing 'onevm list' with 'virsh list':
>
> -bash-4.1$ onevm list | grep myhost
>    192 userx  oneadmin one-192      runn   5      2G          myhost 02
> 17:56:45
>    193 userx  oneadmin one-193      runn   5      2G          myhost 02
> 17:42:58
>    194 userx  oneadmin one-194      runn   1      2G          myhost 00
> 20:17:20
>
>
> [root at myhost ~]# virsh list
>  Id    Name                           State
> ----------------------------------------------------
>  5     one-192                        running
>  6     one-193                        running
>  7     one-194                        running
>  11    one-198                        running
>  14    one-201                        running
>  15    one-202                        running
>
>  I used sunstone, from the 'Virtual Machines' tab I marked the 9 VMs and
> pressed the 'Delete' button (top right).
>
> Looking at the logs we see that the command supposed to shutdown the VM
> failed, but the failure was ignored by VMM:
>
> Fri Feb 22 10:27:06 2013 [VMM][D]: Monitor Information:
>     CPU   : 3
>     Memory: 2097152
>     Net_TX: 215229
>     Net_RX: 5664872
> Fri Feb 22 10:35:34 2013 [DiM][I]: New VM state is DONE
> Fri Feb 22 10:35:34 2013 [VMM][W]: Ignored: LOG I 201 Driver command for
> 201 cancelled
>
> *Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG I 201 Command execution
> fail: /var/tmp/one/vmm/kvm/cancel one-201 myhost 201 myhost*
>
> Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG I 201
> ssh_exchange_identification: Connection closed by remote host
>
> Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG I 201 ExitSSHCode: 255
>
> Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG E 201 Error connecting to
> myhost
>
> Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG I 201 Failed to execute
> virtualization driver operation: cancel.
>
> Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: CANCEL FAILURE 201 Error
> connecting to myhost
>
> Fri Feb 22 10:35:35 2013 [TM][W]: Ignored: LOG I 201 tm_delete.sh: HK
> Deleting myhost /var/lib/one/local/201/images
>
> Fri Feb 22 10:35:35 2013 [TM][W]: Ignored: LOG I 201 tm_delete.sh:
> Executed "ssh myhost rm -rf /var/lib/one/local/201/images".
>
> Fri Feb 22 10:35:35 2013 [TM][W]: Ignored: LOG I 201 ExitCode: 0
>
> Fri Feb 22 10:35:35 2013 [TM][W]: Ignored: TRANSFER SUCCESS 201 -
>
>
> The delete command actually succeeded:
>
> [root at myhost ~]# ll /var/lib/one/local/201
> total 0
>
> This looks like a bug in the VMM component; it should not be ignoring
> failures...
>
> IMHO in this case a proper ONE behavior should be to consider the 'delete'
> operation as failed, thus not removing the image. Then it could check for
> the actual VM status or leave it in a ERROR state.
>
> Is there any way to avoid VMM to Ignore errors on the scripts it calls? I
> am using ONE3.2.
>
> Thanks,
>
> --
> Gerard Bernabeu
> FermiCloud and FermiGrid Services at Fermilab
> Phone (+1) 630-840-6509
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
> --
> Ruben S. Montero, PhD
> Project co-Lead and Chief Architect
> OpenNebula - The Open Source Solution for Data Center Virtualization
> <http://lists.opennebula.org/listinfo.cgi/users-opennebula.org>
> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130225/5661e9ee/attachment-0002.htm>


More information about the Users mailing list