[one-users] "onevm delete" deletes extra VM
Shi Jin
jinzishuai at gmail.com
Wed Sep 30 09:27:57 PDT 2009
OK. With another designed experiment I am quite sure that the problem is
within KVM/libvirtd rather OpenNebula. Please see the following, which has
nothing to do with OpenNebula.root at node1:/srv/cloud/ImgRep/WinXP# virsh list
Connecting to uri: qemu:///system
Id Name State
----------------------------------
27 one-80 running
28 one-81 running
29 one-83 running
30 one-82 running
31 one-79 running
root at node1:/srv/cloud/ImgRep/WinXP# virsh destroy 29
Connecting to uri: qemu:///system
Domain 29 destroyed
root at node1:/srv/cloud/ImgRep/WinXP# virsh list
Connecting to uri: qemu:///system
Id Name State
----------------------------------
27 one-80 running
28 one-81 running
31 one-79 running
The /var/log/message shows
Sep 30 08:58:59 node1 kernel: [238101.304619] br1: port 5(vnet4) entering
disabled state
Sep 30 08:58:59 node1 kernel: [238101.342851] device vnet4 left promiscuous
mode
Sep 30 08:58:59 node1 kernel: [238101.342854] br1: port 5(vnet4) entering
disabled state
Sep 30 08:58:59 node1 kernel: [238101.424851] br1: port 4(vnet3) entering
disabled state
Sep 30 08:59:00 node1 kernel: [238101.463031] device vnet3 left promiscuous
mode
Sep 30 08:59:00 node1 kernel: [238101.463035] br1: port 4(vnet3) entering
disabled state
Sep 30 09:10:53 node1 -- MARK --
Sep 30 09:30:53 node1 -- MARK --
Sep 30 09:50:53 node1 -- MARK --
Sep 30 09:51:31 node1 kernel: [241253.266955] device vnet3 entered
promiscuous mode
Sep 30 09:51:31 node1 kernel: [241253.268857] br1: port 4(vnet3) entering
learning state
Sep 30 09:51:40 node1 kernel: [241262.260147] br1: topology change detected,
propagating
Sep 30 09:51:40 node1 kernel: [241262.260152] br1: port 4(vnet3) entering
forwarding state
Sep 30 09:53:59 node1 kernel: [241401.204570] br1: port 4(vnet3) entering
disabled state
Sep 30 09:53:59 node1 kernel: [241401.232395] device vnet3 left promiscuous
mode
Sep 30 09:53:59 node1 kernel: [241401.232399] br1: port 4(vnet3) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.694803] br1: port 2(vnet0) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.733095] device vnet0 left promiscuous
mode
Sep 30 09:54:20 node1 kernel: [241421.733099] br1: port 2(vnet0) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.815209] br2: port 2(vnet1) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.853728] device vnet1 left promiscuous
mode
Sep 30 09:54:20 node1 kernel: [241421.853732] br2: port 2(vnet1) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.935254] br1: port 3(vnet2) entering
disabled state
Sep 30 09:54:20 node1 kernel: [241421.963527] device vnet2 left promiscuous
mode
Sep 30 09:54:20 node1 kernel: [241421.963531] br1: port 3(vnet2) entering
disabled state
Sep 30 09:55:29 node1 kernel: [241490.644204] br1: port 6(vnet5) entering
disabled state
Sep 30 09:55:29 node1 kernel: [241490.673195] device vnet5 left promiscuous
mode
Sep 30 09:55:29 node1 kernel: [241490.673199] br1: port 6(vnet5) entering
disabled state
Sep 30 10:10:53 node1 -- MARK --
Sep 30 10:11:49 node1 kernel: [242470.955142] device vnet0 entered
promiscuous mode
Sep 30 10:11:49 node1 kernel: [242470.957051] br1: port 2(vnet0) entering
learning state
Sep 30 10:11:52 node1 kernel: [242474.180036] device vnet1 entered
promiscuous mode
Sep 30 10:11:52 node1 kernel: [242474.181943] br1: port 3(vnet1) entering
learning state
Sep 30 10:11:55 node1 kernel: [242477.431693] device vnet2 entered
promiscuous mode
Sep 30 10:11:55 node1 kernel: [242477.433558] br1: port 4(vnet2) entering
learning state
Sep 30 10:11:58 node1 kernel: [242479.950082] br1: topology change detected,
propagating
Sep 30 10:11:58 node1 kernel: [242479.950086] br1: port 2(vnet0) entering
forwarding state
Sep 30 10:12:00 node1 kernel: [242482.059146] device vnet3 entered
promiscuous mode
Sep 30 10:12:00 node1 kernel: [242482.060330] br1: port 5(vnet3) entering
learning state
Sep 30 10:12:01 node1 kernel: [242483.180170] br1: topology change detected,
propagating
Sep 30 10:12:01 node1 kernel: [242483.180173] br1: port 3(vnet1) entering
forwarding state
Sep 30 10:12:04 node1 kernel: [242485.710948] device vnet4 entered
promiscuous mode
Sep 30 10:12:04 node1 kernel: [242485.712109] br1: port 6(vnet4) entering
learning state
Sep 30 10:12:04 node1 kernel: [242486.430098] br1: topology change detected,
propagating
Sep 30 10:12:04 node1 kernel: [242486.430104] br1: port 4(vnet2) entering
forwarding state
Sep 30 10:12:09 node1 kernel: [242491.051857] br1: topology change detected,
propagating
Sep 30 10:12:09 node1 kernel: [242491.051862] br1: port 5(vnet3) entering
forwarding state
Sep 30 10:12:13 node1 kernel: [242494.711792] br1: topology change detected,
propagating
Sep 30 10:12:13 node1 kernel: [242494.711797] br1: port 6(vnet4) entering
forwarding state
Sep 30 10:22:00 node1 kernel: [243082.274008] br1: port 4(vnet2) entering
disabled state
Sep 30 10:22:00 node1 kernel: [243082.312108] device vnet2 left promiscuous
mode
Sep 30 10:22:00 node1 kernel: [243082.312112] br1: port 4(vnet2) entering
disabled state
Sep 30 10:22:00 node1 kernel: [243082.365331] br1: port 5(vnet3) entering
disabled state
Sep 30 10:22:00 node1 kernel: [243082.393430] device vnet3 left promiscuous
mode
Sep 30 10:22:00 node1 kernel: [243082.393433] br1: port 5(vnet3) entering
disabled state
I have 4 NICs and the ifconfig shows
root at node1:/srv/cloud/ImgRep/WinXP# ifconfig
bond0 Link encap:Ethernet HWaddr 00:24:e8:6b:0d:57
inet6 addr: fe80::224:e8ff:fe6b:d57/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:7799931 errors:0 dropped:0 overruns:0 frame:0
TX packets:6879357 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4360166425 (4.3 GB) TX bytes:4350243659 (4.3 GB)
br0 Link encap:Ethernet HWaddr 00:24:e8:6b:0d:57
inet addr:192.168.1.128 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::224:e8ff:fe6b:d57/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7641913 errors:0 dropped:0 overruns:0 frame:0
TX packets:6412260 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4169876739 (4.1 GB) TX bytes:4291182908 (4.2 GB)
br1 Link encap:Ethernet HWaddr 00:24:e8:6b:0d:5b
inet6 addr: fe80::224:e8ff:fe6b:d5b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:152606 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:11276026 (11.2 MB) TX bytes:468 (468.0 B)
br2 Link encap:Ethernet HWaddr 00:24:e8:6b:0d:5d
inet6 addr: fe80::224:e8ff:fe6b:d5d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:152596 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:11286132 (11.2 MB) TX bytes:468 (468.0 B)
eth0 Link encap:Ethernet HWaddr 00:24:e8:6b:0d:57
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:3622205 errors:0 dropped:0 overruns:0 frame:0
TX packets:3443103 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1758147112 (1.7 GB) TX bytes:2174389369 (2.1 GB)
Interrupt:36 Memory:d6000000-d6012700
eth1 Link encap:Ethernet HWaddr 00:24:e8:6b:0d:57
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:4177726 errors:0 dropped:0 overruns:0 frame:0
TX packets:3436254 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2602019313 (2.6 GB) TX bytes:2175854290 (2.1 GB)
Interrupt:48 Memory:d8000000-d8012700
eth2 Link encap:Ethernet HWaddr 00:24:e8:6b:0d:5b
inet6 addr: fe80::224:e8ff:fe6b:d5b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:256014 errors:0 dropped:0 overruns:0 frame:0
TX packets:91549 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:20680139 (20.6 MB) TX bytes:17317087 (17.3 MB)
Interrupt:32 Memory:da000000-da012700
eth3 Link encap:Ethernet HWaddr 00:24:e8:6b:0d:5d
inet6 addr: fe80::224:e8ff:fe6b:d5d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:245189 errors:0 dropped:0 overruns:0 frame:0
TX packets:32669 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:20866257 (20.8 MB) TX bytes:8265329 (8.2 MB)
Interrupt:42 Memory:dc000000-dc012700
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:324787 errors:0 dropped:0 overruns:0 frame:0
TX packets:324787 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:571707504 (571.7 MB) TX bytes:571707504 (571.7 MB)
vnet0 Link encap:Ethernet HWaddr 52:c3:a3:d0:09:84
inet6 addr: fe80::50c3:a3ff:fed0:984/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:72 errors:0 dropped:0 overruns:0 frame:0
TX packets:533 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:10410 (10.4 KB) TX bytes:49306 (49.3 KB)
vnet1 Link encap:Ethernet HWaddr 02:0d:4c:1e:5c:82
inet6 addr: fe80::d:4cff:fe1e:5c82/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:36 errors:0 dropped:0 overruns:0 frame:0
TX packets:563 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:6023 (6.0 KB) TX bytes:53245 (53.2 KB)
vnet4 Link encap:Ethernet HWaddr a6:58:67:2b:37:8a
inet6 addr: fe80::a458:67ff:fe2b:378a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:42 errors:0 dropped:0 overruns:0 frame:0
TX packets:552 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:6733 (6.7 KB) TX bytes:52189 (52.1 KB)
Note that there is no IP address assigned to the br1 device which is used
here. I don't think that should be a problem.
Shi
On Wed, Sep 30, 2009 at 10:09 AM, Shi Jin <jinzishuai at gmail.com> wrote:
> *Here is the vm.log for 76:*
> Wed Sep 30 08:57:50 2009 [VMM][D]: Monitor Information:
> CPU : -1
> Memory: 1048576
> Net_TX: -1
> Net_RX: -1
> Wed Sep 30 08:57:56 2009 [DiM][I]: New VM state is DONE
> Wed Sep 30 08:57:56 2009 [VMM][W]: Ignored: LOG - 76 Driver command for 76
> cancelled
>
> Wed Sep 30 08:57:56 2009 [VMM][W]: Ignored: CANCEL SUCCESS 76 -
>
> Wed Sep 30 08:58:00 2009 [TM][W]: Ignored: LOG - 76 tm_delete.sh: Deleting
> /srv/cloud/one/var/76/images
>
> Wed Sep 30 08:58:00 2009 [TM][W]: Ignored: LOG - 76 tm_delete.sh: Executed
> "rm -rf /srv/cloud/one/var/76/images".
>
> Wed Sep 30 08:58:00 2009 [TM][W]: Ignored: TRANSFER SUCCESS 76 -
>
> *And vm.log for 77:*
> Wed Sep 30 08:57:50 2009 [VMM][D]: Monitor Information:
> CPU : -1
> Memory: 1048576
> Net_TX: -1
> Net_RX: -1
> Wed Sep 30 08:58:50 2009 [VMM][I]: Command execution fail: virsh dominfo
> one-77
> Wed Sep 30 08:58:50 2009 [VMM][I]: STDERR follows.
> Wed Sep 30 08:58:50 2009 [VMM][I]: Connecting to uri: qemu:///system
> Wed Sep 30 08:58:50 2009 [VMM][I]: error: failed to get domain 'one-77'
> Wed Sep 30 08:58:50 2009 [VMM][I]: error: Domain not found
> Wed Sep 30 08:58:50 2009 [VMM][I]: ExitCode: 1
> Wed Sep 30 08:58:50 2009 [VMM][I]: VM running but it was not found. Restart
> and delete actions available or try to recover it manually
> Wed Sep 30 08:58:50 2009 [LCM][I]: New VM state is UNKNOWN
> Wed Sep 30 08:59:50 2009 [VMM][I]: Command execution fail: virsh dominfo
> one-77
> Wed Sep 30 08:59:50 2009 [VMM][I]: STDERR follows.
> Wed Sep 30 08:59:50 2009 [VMM][I]: Connecting to uri: qemu:///system
> Wed Sep 30 08:59:50 2009 [VMM][I]: error: failed to get domain 'one-77'
> Wed Sep 30 08:59:50 2009 [VMM][I]: error: Domain not found
> Wed Sep 30 08:59:50 2009 [VMM][I]: ExitCode: 1
> Wed Sep 30 08:59:50 2009 [VMM][I]: VM running but it was not found. Restart
> and delete actions available or try to recover it manually
>
> *And oned.log:*
> Wed Sep 30 08:57:55 2009 [ReM][D]: VirtualMachineAction invoked
> Wed Sep 30 08:57:55 2009 [DiM][D]: Finalizing VM 76
> Wed Sep 30 08:57:56 2009 [VMM][D]: Message received: LOG - 76 Driver
> command for 76 cancelled
>
> Wed Sep 30 08:57:56 2009 [VMM][D]: Message received: CANCEL SUCCESS 76 -
>
> Wed Sep 30 08:57:58 2009 [ReM][D]: VirtualMachinePoolInfo method invoked
> Wed Sep 30 08:58:00 2009 [TM][D]: Message received: LOG - 76 tm_delete.sh:
> Deleting /srv/cloud/one/var/76/images
>
> Wed Sep 30 08:58:00 2009 [TM][D]: Message received: LOG - 76 tm_delete.sh:
> Executed "rm -rf /srv/cloud/one/var/76/images".
>
> Wed Sep 30 08:58:00 2009 [TM][D]: Message received: TRANSFER SUCCESS 76 -
>
> Wed Sep 30 08:58:50 2009 [VMM][I]: Monitoring VM 72.
> Wed Sep 30 08:58:50 2009 [VMM][I]: Monitoring VM 73.
> Wed Sep 30 08:58:50 2009 [VMM][I]: Monitoring VM 77.
> Wed Sep 30 08:58:50 2009 [VMM][I]: Monitoring VM 78.
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: POLL SUCCESS 72
> STATE=a USEDMEMORY=1048576
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 Command
> execution fail: virsh dominfo one-77
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 STDERR
> follows.
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 Connecting to
> uri: qemu:///system
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 error: failed
> to get domain 'one-77'
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 error: Domain
> not found
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: LOG - 77 ExitCode: 1
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: POLL SUCCESS 77
> STATE=d
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: POLL SUCCESS 73
> STATE=a USEDMEMORY=1048576
>
> Wed Sep 30 08:58:50 2009 [VMM][D]: Message received: POLL SUCCESS 78
> STATE=a USEDMEMORY=1048576
>
> Wed Sep 30 08:59:02 2009 [InM][I]: Monitoring host node1 (4)
> Wed Sep 30 08:59:06 2009 [InM][D]: Host 4 successfully monitored.
>
> *In my opinion the above log files don't have much more information than
> VM 77 is no longer available.*
> *
> *
> *Shi*
>
> On Wed, Sep 30, 2009 at 10:02 AM, Ruben S. Montero <rubensm at dacya.ucm.es>
> wrote:
> > Hi,
> >
> > Does the 77 vm.log say something relevant? Can you send us also the
> > vm.log file for 76? OpenNebula should never try to delete VM 77...
> >
> > Thanks
> >
> > Ruben
> >
> > On Wed, Sep 30, 2009 at 5:28 PM, Shi Jin <jinzishuai at gmail.com> wrote:
> >> Hi there,
> >>
> >> I am running one-1.4beta with KVM hypervisor.
> >> Recently I found a very bizarre behavior of "onevm delete".
> >> For example, initially there are 5 VMs running:72,73,76,77,78.
> >> Then I did "onevm delete 76" and found only 3 VMs are left: the VM 77
> >> is also deleted.
> >> "virsh list" on the host node shows VM 77 is gone and "onevm list"
> >> shows its status to be unknown.
> >> I think this is a serious bug and requires a quick fix.
> >> I tried to dig into the OpenNebula log files but found nothing other
> >> than " Command execution fail: virsh dominfo one-77" after the "onevm
> >> delete 76" command.
> >>
> >> On the host node, the syslog shows something interesting:
> >> Sep 30 08:58:59 node1 kernel: [238101.304619] br1: port 5(vnet4)
> >> entering disabled state
> >> Sep 30 08:58:59 node1 kernel: [238101.342851] device vnet4 left
> promiscuous mode
> >> Sep 30 08:58:59 node1 kernel: [238101.342854] br1: port 5(vnet4)
> >> entering disabled state
> >> Sep 30 08:58:59 node1 kernel: [238101.424851] br1: port 4(vnet3)
> >> entering disabled state
> >> Sep 30 08:59:00 node1 kernel: [238101.463031] device vnet3 left
> promiscuous mode
> >> Sep 30 08:59:00 node1 kernel: [238101.463035] br1: port 4(vnet3)
> >> entering disabled state
> >> Sep 30 08:59:53 node1 libvirtd: 08:59:53.480: error : Domain not found
> >>
> >> Please note that the VM 76,77,78 all use the same vnet (bridged on
> >> br1) and their corresponding ports are vnet3,vnet4 and vnet5. I am
> >> not exactly sure what the above message implies but I have a feeling
> >> that something went wrong about the network. In this sense, this
> >> problem may not be a OpenNebula bug, possibly a KVM bug too.
> >>
> >> Actually this kind of problem has happened many times for us. This is
> >> my first time to carefully record what has happened.
> >> I am wondering if anyone has encountered this kind of problem before
> >> and is there a fix?
> >>
> >> Thanks a lot.
> >>
> >> Shi
> >> --
> >> Shi Jin, Ph.D.
> >> _______________________________________________
> >> Users mailing list
> >> Users at lists.opennebula.org
> >> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> >>
> >
> >
> >
> > --
> > +---------------------------------------------------------------+
> > Dr. Ruben Santiago Montero
> > Associate Professor
> > Distributed System Architecture Group (http://dsa-research.org)
> >
> > URL: http://dsa-research.org/doku.php?id=people:ruben
> > Weblog: http://blog.dsa-research.org/?author=7
> >
> > GridWay, http://www.gridway.org
> > OpenNebula, http://www.opennebula.org
> > +---------------------------------------------------------------+
> >
>
>
>
> --
> Shi Jin, Ph.D.
>
>
>
--
Shi Jin, Ph.D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20090930/ecd30715/attachment-0001.htm>
More information about the Users
mailing list