<div dir="ltr">Hi<div><br></div><div>There are some issues in 3.2 that have been solved through 3.8 and in next 4.0:</div><div><br></div><div>1.- We've improved the checks and synchronization of delete operation, so it will wait for the cancel operations.</div>
<div><br></div><div>2.- Delete is an admin operation you should be using cancel to dispose VMs (it performs additional checks). Think of delete as kill -9, the result of some actions are indeed not checked in this mode (see below).</div>
<div><br></div><div>3.- In my experience libvirt is not very good at doing several things at the same time... You may try to use the 4.0 drivers that retry the operation, e.g.</div><div><br></div><div>
<a href="https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/kvm/cancel" target="_blank">https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/kvm/cancel</a><br></div><div><br></div><div>4.- Monitoring system has been improved to discover zombie vms, you'll be notified each monitoring step, so proper automatic or manual corrective actions could be triggered. </div>
<div><br></div><div>Cheers </div><div><br></div><div>Ruben</div><div><br></div><div><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Feb 22, 2013 at 6:52 PM, Gerard Bernabeu <span dir="ltr"><<a href="mailto:gerard1@fnal.gov" target="_blank">gerard1@fnal.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hi,<br>
<br>
during a bulk VM deletion process 3 out of 9 VMs failed to be
actually deleted from the hypervisor host (they're still running).
This is obvious by comparing 'onevm list' with 'virsh list':
<br>
<br>
<blockquote>-bash-4.1$ onevm list | grep myhost
<br>
192 userx oneadmin one-192 runn 5 2G
myhost 02 17:56:45
<br>
193 userx oneadmin one-193 runn 5 2G
myhost 02 17:42:58
<br>
194 userx oneadmin one-194 runn 1 2G
myhost 00 20:17:20
<br>
<br>
<br>
[root@myhost ~]# virsh list
<br>
Id Name State
<br>
----------------------------------------------------
<br>
5 one-192 running
<br>
6 one-193 running
<br>
7 one-194 running
<br>
11 one-198 running
<br>
14 one-201 running
<br>
15 one-202 running
<br>
<br>
</blockquote>
I used sunstone, from the 'Virtual Machines' tab I marked the 9 VMs
and pressed the 'Delete' button (top right).<br>
<br>
Looking at the logs we see that the command supposed to shutdown the
VM failed, but the failure was ignored by VMM:<br>
<br>
<blockquote>Fri Feb 22 10:27:06 2013 [VMM][D]: Monitor Information:<br>
CPU : 3<br>
Memory: 2097152<br>
Net_TX: 215229<br>
Net_RX: 5664872<br>
Fri Feb 22 10:35:34 2013 [DiM][I]: New VM state is DONE<br>
Fri Feb 22 10:35:34 2013 [VMM][W]: Ignored: LOG I 201 Driver
command for 201 cancelled<br>
<br>
<b>Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG I 201 Command
execution fail: /var/tmp/one/vmm/kvm/cancel one-201 myhost 201
myhost</b><br>
<br>
Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG I 201
ssh_exchange_identification: Connection closed by remote host<br>
<br>
Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG I 201 ExitSSHCode:
255<br>
<br>
Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG E 201 Error
connecting to myhost<br>
<br>
Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: LOG I 201 Failed to
execute virtualization driver operation: cancel.<br>
<br>
Fri Feb 22 10:35:35 2013 [VMM][W]: Ignored: CANCEL FAILURE 201
Error connecting to myhost<br>
<br>
Fri Feb 22 10:35:35 2013 [TM][W]: Ignored: LOG I 201 tm_delete.sh:
HK Deleting myhost /var/lib/one/local/201/images<br>
<br>
Fri Feb 22 10:35:35 2013 [TM][W]: Ignored: LOG I 201 tm_delete.sh:
Executed "ssh myhost rm -rf /var/lib/one/local/201/images".<br>
<br>
Fri Feb 22 10:35:35 2013 [TM][W]: Ignored: LOG I 201 ExitCode: 0<br>
<br>
Fri Feb 22 10:35:35 2013 [TM][W]: Ignored: TRANSFER SUCCESS 201 -<br>
</blockquote>
<br>
The delete command actually succeeded:<br>
<br>
<blockquote>[root@myhost ~]# ll /var/lib/one/local/201<br>
total 0<br>
</blockquote>
This looks like a bug in the VMM component; it should not be
ignoring failures... <br>
<br>
IMHO in this case a proper ONE behavior should be to consider the
'delete' operation as failed, thus not removing the image. Then it
could check for the actual VM status or leave it in a ERROR state.<br>
<br>
Is there any way to avoid VMM to Ignore errors on the scripts it
calls? I am using ONE3.2.<br>
<br>
Thanks,<span><font color="#888888"><br>
<pre cols="72">--
Gerard Bernabeu
FermiCloud and FermiGrid Services at Fermilab
Phone <a href="tel:%28%2B1%29%20630-840-6509" value="+16308406509" target="_blank">(+1) 630-840-6509</a></pre>
</font></span></div>
<br>_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>
<a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org" target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org<br clear="all"><div><br></div>-- <br>Ruben S. Montero, PhD<br>Project co-Lead and Chief Architect<br>
OpenNebula - The Open Source Solution for Data Center Virtualization<br></a><a href="http://www.OpenNebula.org" target="_blank">www.OpenNebula.org</a> | <a href="mailto:rsmontero@opennebula.org" target="_blank">rsmontero@opennebula.org</a> | @OpenNebula
</blockquote></div></div></div>