[one-users] "onevm delete" deletes extra VM

Shi Jin jinzishuai at gmail.com
Wed Sep 30 09:27:57 PDT 2009


OK. With another designed experiment I am quite sure that the problem is
within KVM/libvirtd rather OpenNebula. Please see the following, which has
nothing to do with OpenNebula.root at node1:/srv/cloud/ImgRep/WinXP# virsh list
Connecting to uri: qemu:///system
 Id Name                 State
----------------------------------
 27 one-80               running
 28 one-81               running
 29 one-83               running
 30 one-82               running
 31 one-79               running

root at node1:/srv/cloud/ImgRep/WinXP# virsh destroy 29
Connecting to uri: qemu:///system
Domain 29 destroyed

root at node1:/srv/cloud/ImgRep/WinXP# virsh list
Connecting to uri: qemu:///system
 Id Name                 State
----------------------------------
 27 one-80               running
 28 one-81               running
 31 one-79               running

The /var/log/message shows
Sep 30 08:58:59 node1 kernel: [238101.304619] br1: port 5(vnet4) entering
disabled state
Sep 30 08:58:59 node1 kernel: [238101.342851] device vnet4 left promiscuous
mode
Sep 30 08:58:59 node1 kernel: [238101.342854] br1: port 5(vnet4) entering
disabled state
Sep 30 08:58:59 node1 kernel: [238101.424851] br1: port 4(vnet3) entering
disabled state
Sep 30 08:59:00 node1 kernel: [238101.463031] device vnet3 left promiscuous
mode
Sep 30 08:59:00 node1 kernel: [238101.463035] br1: port 4(vnet3) entering
disabled state
Sep 30 09:10:53 node1 -- MARK --
Sep 30 09:30:53 node1 -- MARK --
Sep 30 09:50:53 node1 -- MARK --
Sep 30 09:51:31 node1 kernel: [241253.266955] device vnet3 entered
promiscuous mode
Sep 30 09:51:31 node1 kernel: [241253.268857] br1: port 4(vnet3) entering
learning state
Sep 30 09:51:40 node1 kernel: [241262.260147] br1: topology change detected,
propagating
Sep 30 09:51:40 node1 kernel: [241262.260152] br1: port 4(vnet3) entering
forwarding state
Sep 30 09:53:59 node1 kernel: [241401.204570] br1: port 4(vnet3) entering
disabled state
Sep 30 09:53:59 node1 kernel: [241401.232395] device vnet3 left promiscuous
mode
Sep 30 09:53:59 node1 kernel: [241401.232399] br1: port 4(vnet3) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.694803] br1: port 2(vnet0) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.733095] device vnet0 left promiscuous
mode
Sep 30 09:54:20 node1 kernel: [241421.733099] br1: port 2(vnet0) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.815209] br2: port 2(vnet1) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.853728] device vnet1 left promiscuous
mode
Sep 30 09:54:20 node1 kernel: [241421.853732] br2: port 2(vnet1) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.935254] br1: port 3(vnet2) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.963527] device vnet2 left promiscuous
mode
Sep 30 09:54:20 node1 kernel: [241421.963531] br1: port 3(vnet2) entering
disabled state
Sep 30 09:55:29 node1 kernel: [241490.644204] br1: port 6(vnet5) entering
disabled state
Sep 30 09:55:29 node1 kernel: [241490.673195] device vnet5 left promiscuous
mode
Sep 30 09:55:29 node1 kernel: [241490.673199] br1: port 6(vnet5) entering
disabled state
Sep 30 10:10:53 node1 -- MARK --
Sep 30 10:11:49 node1 kernel: [242470.955142] device vnet0 entered
promiscuous mode
Sep 30 10:11:49 node1 kernel: [242470.957051] br1: port 2(vnet0) entering
learning state
Sep 30 10:11:52 node1 kernel: [242474.180036] device vnet1 entered
promiscuous mode
Sep 30 10:11:52 node1 kernel: [242474.181943] br1: port 3(vnet1) entering
learning state
Sep 30 10:11:55 node1 kernel: [242477.431693] device vnet2 entered
promiscuous mode
Sep 30 10:11:55 node1 kernel: [242477.433558] br1: port 4(vnet2) entering
learning state
Sep 30 10:11:58 node1 kernel: [242479.950082] br1: topology change detected,
propagating
Sep 30 10:11:58 node1 kernel: [242479.950086] br1: port 2(vnet0) entering
forwarding state
Sep 30 10:12:00 node1 kernel: [242482.059146] device vnet3 entered
promiscuous mode
Sep 30 10:12:00 node1 kernel: [242482.060330] br1: port 5(vnet3) entering
learning state
Sep 30 10:12:01 node1 kernel: [242483.180170] br1: topology change detected,
propagating
Sep 30 10:12:01 node1 kernel: [242483.180173] br1: port 3(vnet1) entering
forwarding state
Sep 30 10:12:04 node1 kernel: [242485.710948] device vnet4 entered
promiscuous mode
Sep 30 10:12:04 node1 kernel: [242485.712109] br1: port 6(vnet4) entering
learning state
Sep 30 10:12:04 node1 kernel: [242486.430098] br1: topology change detected,
propagating
Sep 30 10:12:04 node1 kernel: [242486.430104] br1: port 4(vnet2) entering
forwarding state
Sep 30 10:12:09 node1 kernel: [242491.051857] br1: topology change detected,
propagating
Sep 30 10:12:09 node1 kernel: [242491.051862] br1: port 5(vnet3) entering
forwarding state
Sep 30 10:12:13 node1 kernel: [242494.711792] br1: topology change detected,
propagating
Sep 30 10:12:13 node1 kernel: [242494.711797] br1: port 6(vnet4) entering
forwarding state
Sep 30 10:22:00 node1 kernel: [243082.274008] br1: port 4(vnet2) entering
disabled state
Sep 30 10:22:00 node1 kernel: [243082.312108] device vnet2 left promiscuous
mode
Sep 30 10:22:00 node1 kernel: [243082.312112] br1: port 4(vnet2) entering
disabled state
Sep 30 10:22:00 node1 kernel: [243082.365331] br1: port 5(vnet3) entering
disabled state
Sep 30 10:22:00 node1 kernel: [243082.393430] device vnet3 left promiscuous
mode
Sep 30 10:22:00 node1 kernel: [243082.393433] br1: port 5(vnet3) entering
disabled state

I have 4 NICs and the ifconfig shows
root at node1:/srv/cloud/ImgRep/WinXP# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:24:e8:6b:0d:57
          inet6 addr: fe80::224:e8ff:fe6b:d57/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:7799931 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6879357 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4360166425 (4.3 GB)  TX bytes:4350243659 (4.3 GB)

br0       Link encap:Ethernet  HWaddr 00:24:e8:6b:0d:57
          inet addr:192.168.1.128  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::224:e8ff:fe6b:d57/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7641913 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6412260 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4169876739 (4.1 GB)  TX bytes:4291182908 (4.2 GB)

br1       Link encap:Ethernet  HWaddr 00:24:e8:6b:0d:5b
          inet6 addr: fe80::224:e8ff:fe6b:d5b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:152606 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:11276026 (11.2 MB)  TX bytes:468 (468.0 B)

br2       Link encap:Ethernet  HWaddr 00:24:e8:6b:0d:5d
          inet6 addr: fe80::224:e8ff:fe6b:d5d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:152596 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:11286132 (11.2 MB)  TX bytes:468 (468.0 B)

eth0      Link encap:Ethernet  HWaddr 00:24:e8:6b:0d:57
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:3622205 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3443103 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1758147112 (1.7 GB)  TX bytes:2174389369 (2.1 GB)
          Interrupt:36 Memory:d6000000-d6012700

eth1      Link encap:Ethernet  HWaddr 00:24:e8:6b:0d:57
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:4177726 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3436254 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2602019313 (2.6 GB)  TX bytes:2175854290 (2.1 GB)
          Interrupt:48 Memory:d8000000-d8012700

eth2      Link encap:Ethernet  HWaddr 00:24:e8:6b:0d:5b
          inet6 addr: fe80::224:e8ff:fe6b:d5b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:256014 errors:0 dropped:0 overruns:0 frame:0
          TX packets:91549 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:20680139 (20.6 MB)  TX bytes:17317087 (17.3 MB)
          Interrupt:32 Memory:da000000-da012700

eth3      Link encap:Ethernet  HWaddr 00:24:e8:6b:0d:5d
          inet6 addr: fe80::224:e8ff:fe6b:d5d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:245189 errors:0 dropped:0 overruns:0 frame:0
          TX packets:32669 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:20866257 (20.8 MB)  TX bytes:8265329 (8.2 MB)
          Interrupt:42 Memory:dc000000-dc012700

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:324787 errors:0 dropped:0 overruns:0 frame:0
          TX packets:324787 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:571707504 (571.7 MB)  TX bytes:571707504 (571.7 MB)

vnet0     Link encap:Ethernet  HWaddr 52:c3:a3:d0:09:84
          inet6 addr: fe80::50c3:a3ff:fed0:984/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:72 errors:0 dropped:0 overruns:0 frame:0
          TX packets:533 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:10410 (10.4 KB)  TX bytes:49306 (49.3 KB)

vnet1     Link encap:Ethernet  HWaddr 02:0d:4c:1e:5c:82
          inet6 addr: fe80::d:4cff:fe1e:5c82/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:36 errors:0 dropped:0 overruns:0 frame:0
          TX packets:563 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:6023 (6.0 KB)  TX bytes:53245 (53.2 KB)

vnet4     Link encap:Ethernet  HWaddr a6:58:67:2b:37:8a
          inet6 addr: fe80::a458:67ff:fe2b:378a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:42 errors:0 dropped:0 overruns:0 frame:0
          TX packets:552 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:6733 (6.7 KB)  TX bytes:52189 (52.1 KB)
Note that there is no IP address assigned to the br1 device which is used
here. I don't think that should be a problem.

Shi

On Wed, Sep 30, 2009 at 10:09 AM, Shi Jin <jinzishuai at gmail.com> wrote:

> *Here is the vm.log for 76:*
> Wed Sep 30 08:57:50 2009 [VMM][D]: Monitor Information:
>        CPU   : -1
>        Memory: 1048576
>        Net_TX: -1
>        Net_RX: -1
> Wed Sep 30 08:57:56 2009 [DiM][I]: New VM state is DONE
> Wed Sep 30 08:57:56 2009 [VMM][W]: Ignored: LOG - 76 Driver command for 76
> cancelled
>
> Wed Sep 30 08:57:56 2009 [VMM][W]: Ignored: CANCEL SUCCESS 76 -
>
> Wed Sep 30 08:58:00 2009 [TM][W]: Ignored: LOG - 76 tm_delete.sh: Deleting
> /srv/cloud/one/var/76/images
>
> Wed Sep 30 08:58:00 2009 [TM][W]: Ignored: LOG - 76 tm_delete.sh: Executed
> "rm -rf /srv/cloud/one/var/76/images".
>
> Wed Sep 30 08:58:00 2009 [TM][W]: Ignored: TRANSFER SUCCESS 76 -
>
> *And vm.log for 77:*
> Wed Sep 30 08:57:50 2009 [VMM][D]: Monitor Information:
>        CPU   : -1
>        Memory: 1048576
>        Net_TX: -1
>        Net_RX: -1
> Wed Sep 30 08:58:50 2009 [VMM][I]: Command execution fail: virsh dominfo
> one-77
> Wed Sep 30 08:58:50 2009 [VMM][I]: STDERR follows.
> Wed Sep 30 08:58:50 2009 [VMM][I]: Connecting to uri: qemu:///system
> Wed Sep 30 08:58:50 2009 [VMM][I]: error: failed to get domain 'one-77'
> Wed Sep 30 08:58:50 2009 [VMM][I]: error: Domain not found
> Wed Sep 30 08:58:50 2009 [VMM][I]: ExitCode: 1
> Wed Sep 30 08:58:50 2009 [VMM][I]: VM running but it was not found. Restart
> and delete actions available or try to recover it manually
> Wed Sep 30 08:58:50 2009 [LCM][I]: New VM state is UNKNOWN
> Wed Sep 30 08:59:50 2009 [VMM][I]: Command execution fail: virsh dominfo
> one-77
> Wed Sep 30 08:59:50 2009 [VMM][I]: STDERR follows.
> Wed Sep 30 08:59:50 2009 [VMM][I]: Connecting to uri: qemu:///system
> Wed Sep 30 08:59:50 2009 [VMM][I]: error: failed to get domain 'one-77'
> Wed Sep 30 08:59:50 2009 [VMM][I]: error: Domain not found
> Wed Sep 30 08:59:50 2009 [VMM][I]: ExitCode: 1
> Wed Sep 30 08:59:50 2009 [VMM][I]: VM running but it was not found. Restart
> and delete actions available or try to recover it manually
>
> *And oned.log:*
> Wed Sep 30 08:57:55 2009 [ReM][D]: VirtualMachineAction invoked
> Wed Sep 30 08:57:55 2009 [DiM][D]: Finalizing VM 76
> Wed Sep 30 08:57:56 2009 [VMM][D]: Message received: LOG - 76 Driver
> command for 76 cancelled
>
> Wed Sep 30 08:57:56 2009 [VMM][D]: Message received: CANCEL SUCCESS 76 -
>
> Wed Sep 30 08:57:58 2009 [ReM][D]: VirtualMachinePoolInfo method invoked
> Wed Sep 30 08:58:00 2009 [TM][D]: Message received: LOG - 76 tm_delete.sh:
> Deleting /srv/cloud/one/var/76/images
>
> Wed Sep 30 08:58:00 2009 [TM][D]: Message received: LOG - 76 tm_delete.sh:
> Executed "rm -rf /srv/cloud/one/var/76/images".
>
> Wed Sep 30 08:58:00 2009 [TM][D]: Message received: TRANSFER SUCCESS 76 -
>
> Wed Sep 30 08:58:50 2009 [VMM][I]: Monitoring VM 72.
> Wed Sep 30 08:58:50 2009 [VMM][I]: Monitoring VM 73.
> Wed Sep 30 08:58:50 2009 [VMM][I]: Monitoring VM 77.
> Wed Sep 30 08:58:50 2009 [VMM][I]: Monitoring VM 78.
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: POLL SUCCESS 72
>  STATE=a USEDMEMORY=1048576
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 Command
> execution fail: virsh dominfo one-77
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 STDERR
> follows.
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 Connecting to
> uri: qemu:///system
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 error: failed
> to get domain 'one-77'
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 error: Domain
> not found
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 ExitCode: 1
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: POLL SUCCESS 77
> STATE=d
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: POLL SUCCESS 73
>  STATE=a USEDMEMORY=1048576
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: POLL SUCCESS 78
>  STATE=a USEDMEMORY=1048576
>
> Wed Sep 30 08:59:02 2009 [InM][I]: Monitoring host node1 (4)
> Wed Sep 30 08:59:06 2009 [InM][D]: Host 4 successfully monitored.
>
> *In my opinion the above log files don't have much more information than
> VM 77 is no longer available.*
> *
> *
> *Shi*
>
> On Wed, Sep 30, 2009 at 10:02 AM, Ruben S. Montero <rubensm at dacya.ucm.es>
> wrote:
> > Hi,
> >
> > Does the 77 vm.log say something relevant? Can you send us also the
> > vm.log file for 76? OpenNebula should never try to delete VM 77...
> >
> > Thanks
> >
> > Ruben
> >
> > On Wed, Sep 30, 2009 at 5:28 PM, Shi Jin <jinzishuai at gmail.com> wrote:
> >> Hi there,
> >>
> >> I am running one-1.4beta with KVM hypervisor.
> >> Recently I found a very bizarre behavior of "onevm delete".
> >> For example, initially there are 5 VMs running:72,73,76,77,78.
> >> Then I did "onevm delete 76" and found only 3 VMs are left: the VM 77
> >> is also deleted.
> >> "virsh list" on the host node shows VM 77 is gone and "onevm list"
> >> shows its status to be unknown.
> >> I think this is a serious bug and requires a quick fix.
> >> I tried to dig into the OpenNebula log files but found nothing other
> >> than " Command execution fail: virsh dominfo one-77" after the "onevm
> >> delete 76" command.
> >>
> >> On the host node, the syslog shows something interesting:
> >> Sep 30 08:58:59 node1 kernel: [238101.304619] br1: port 5(vnet4)
> >> entering disabled state
> >> Sep 30 08:58:59 node1 kernel: [238101.342851] device vnet4 left
> promiscuous mode
> >> Sep 30 08:58:59 node1 kernel: [238101.342854] br1: port 5(vnet4)
> >> entering disabled state
> >> Sep 30 08:58:59 node1 kernel: [238101.424851] br1: port 4(vnet3)
> >> entering disabled state
> >> Sep 30 08:59:00 node1 kernel: [238101.463031] device vnet3 left
> promiscuous mode
> >> Sep 30 08:59:00 node1 kernel: [238101.463035] br1: port 4(vnet3)
> >> entering disabled state
> >> Sep 30 08:59:53 node1 libvirtd: 08:59:53.480: error : Domain not found
> >>
> >> Please note that the VM 76,77,78 all use the same vnet (bridged on
> >> br1) and their corresponding ports are vnet3,vnet4 and vnet5.  I am
> >> not exactly sure what  the above message implies but I have a feeling
> >> that something went wrong about the network. In this sense, this
> >> problem may not be a OpenNebula bug, possibly a KVM bug too.
> >>
> >> Actually this kind of problem has happened many times for us. This is
> >> my first time to carefully record what has happened.
> >> I am wondering if anyone has encountered this kind of problem before
> >> and is there a fix?
> >>
> >> Thanks a lot.
> >>
> >> Shi
> >> --
> >> Shi Jin, Ph.D.
> >> _______________________________________________
> >> Users mailing list
> >> Users at lists.opennebula.org
> >> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> >>
> >
> >
> >
> > --
> > +---------------------------------------------------------------+
> >  Dr. Ruben Santiago Montero
> >  Associate Professor
> >  Distributed System Architecture Group (http://dsa-research.org)
> >
> >  URL:    http://dsa-research.org/doku.php?id=people:ruben
> >  Weblog: http://blog.dsa-research.org/?author=7
> >
> >  GridWay, http://www.gridway.org
> >  OpenNebula, http://www.opennebula.org
> > +---------------------------------------------------------------+
> >
>
>
>
> --
> Shi Jin, Ph.D.
>
>
>


-- 
Shi Jin, Ph.D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20090930/ecd30715/attachment-0003.htm>


More information about the Users mailing list