[one-users] onevm - zombie vm

Tue Mar 2 12:48:19 PST 2010

Hi Tino,

Answers inline again.
On Tue, Mar 2, 2010 at 11:28 AM, Tino Vazquez <tinova at fdi.ucm.es> wrote:
> Hi Shi Jin,
>
> Thanks a lot, we have a few more, since we still cannot figure out the problem:
>
> * Does the host get its capacity (cpu, memory) released?
I don't think so but not 100% sure.  Right now there is no other VMs
running, I have
[oneadmin at node2 one-svn-64bit]$ onehost list
  ID NAME                      RVM   TCPU   FCPU   ACPU    TMEM    FMEM STAT
  36 node2                       1   1600   1600   1600 3294948 3234368   on
[oneadmin at node2 one-svn-64bit]$ onehost show node2
HOST 36 INFORMATION
ID                    : 36
NAME                  : node2
STATE                 : MONITORED
IM_MAD                : im_kvm
VM_MAD                : vmm_kvm
TM_MAD                : tm_cow_ssh

HOST SHARES
MAX MEM               : 32949488
USED MEM (REAL)       : 5593588
USED MEM (ALLOCATED)  : 512000
MAX CPU               : 1600
USED CPU (REAL)       : 0
USED CPU (ALLOCATED)  : 25
RUNNING VMS           : 1

MONITORING INFORMATION
ARCH=x86_64
CPUSPEED=1596
FREECPU=1600.0
FREEMEMORY=32343776
HOSTNAME=node2
HYPERVISOR=kvm
MODELNAME=Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
NETRX=115977029
NETTX=653083
TOTALCPU=1600
TOTALMEMORY=32949488
USEDCPU=0.0
USEDMEMORY=5593588
I think the CPU and Memory are queried for the actual system usage and
since the VM is no longer consuming any resource (there is no such KVM
process running, neither does virsh list show anything), it probably
does not really hold any CPU/RAM.

> * Can you send the output of "onevm show -x"?
[oneadmin at node2 one-svn-64bit]$ onevm show -x 7930
<VM>
  <ID>7930</ID>
  <UID>0</UID>
  <NAME>fc11AOS</NAME>
  <LAST_POLL>1267223236</LAST_POLL>
  <STATE>3</STATE>
  <LCM_STATE>15</LCM_STATE>
  <STIME>1267209261</STIME>
  <ETIME>0</ETIME>
  <DEPLOY_ID>one-7930</DEPLOY_ID>
  <MEMORY>512000</MEMORY>
  <CPU>0</CPU>
  <NET_TX>0</NET_TX>
  <NET_RX>0</NET_RX>
  <TEMPLATE>
    <CONTEXT>
      <FILES>/srv/cloud/contextScripts/Redhat/init.sh
/srv/cloud/contextScripts/Redhat/ipcontext.sh</FILES>
      <HOSTNAME>fc11AOS-7930</HOSTNAME>
      <TARGET>hdc</TARGET>
    </CONTEXT>
    <CPU>0.25</CPU>
    <DISK>
      <BUS>virtio</BUS>
      <READONLY>no</READONLY>
      <SOURCE>/srv/cloud/ImgRep/FC11AOS/Fedora-11-x86_64-AOS-sda.raw</SOURCE>
      <TARGET>sda</TARGET>
    </DISK>
    <GRAPHICS>
      <TYPE>vnc</TYPE>
    </GRAPHICS>
    <HAIZEA>
      <DURATION>01:00:00</DURATION>
      <PREEMPTIBLE>no</PREEMPTIBLE>
      <START>best_effort</START>
    </HAIZEA>
    <MEMORY>500</MEMORY>
    <NAME>fc11AOS</NAME>
    <NIC>
      <BRIDGE>br0</BRIDGE>
      <IP>192.168.1.203</IP>
      <MAC>00:00:c0:a8:01:cb</MAC>
      <NETWORK>intranet</NETWORK>
      <VNID>8</VNID>
    </NIC>
    <OS>
      <BOOT>hd</BOOT>
    </OS>
    <REQUIREMENTS>HOSTNAME=node2</REQUIREMENTS>
    <VCPU>1</VCPU>
    <VMID>7930</VMID>
  </TEMPLATE>
  <HISTORY>
    <SEQ>0</SEQ>
    <HOSTNAME>node2</HOSTNAME>
    <HID>36</HID>
    <STIME>1267209265</STIME>
    <ETIME>0</ETIME>
    <PSTIME>1267209265</PSTIME>
    <PETIME>1267209267</PETIME>
    <RSTIME>1267209267</RSTIME>
    <RETIME>0</RETIME>
    <ESTIME>0</ESTIME>
    <EETIME>0</EETIME>
    <REASON>0</REASON>
  </HISTORY>
</VM>

> * And last, but not least, we are fearing a deadlock, so we would like
> to ask if the rest of OpenNebula is working as expected (can you still
> work with VMs, create them, deploy them, stop them), with no problems,
> or there is something stuck (the TM, the IM, ...).
Yes, the rest of the system is working fine. I have created/deleted
many VMs without any problem.
The only problem I might have had is for the vnet lease where it is
already leased out but I don't recall the situation exactly.

>
> Well, this is very valuable. We appreciate it.
>
> Regards,
>
> -Tino
> --
> Constantino Vázquez, Grid & Virtualization Technology
> Engineer/Researcher: http://www.dsa-research.org/tinova
> DSA Research Group: http://dsa-research.org
> Globus GridWay Metascheduler: http://www.GridWay.org
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Tue, Mar 2, 2010 at 6:01 PM, Shi Jin <jinzishuai at gmail.com> wrote:
>> Hi Tino,
>>
>> I will do my best to help hunt down the bug. Please see my answers inline.
>> Note my zombie VM id 7930.
>>
>> On Tue, Mar 2, 2010 at 9:28 AM, Tino Vazquez <tinova at fdi.ucm.es> wrote:
>>> Hi Shi Jin, Nikola,
>>>
>>>  This seems like a bug, we will need more info to address the problem.
>>> We would appreciate it if you'll be so kind to provide more feedback
>>> on:
>>>
>>> * Which driver are you using (xen,kvm,vmware)?
>> I am using KVM.
>>
>>>
>>> * In what state was the VM prior to the delete action?
>> I don't recall exactly. But vm.log shows it was RUNNING. Then the
>> vm.log just shows repetitive
>> Fri Feb 26 15:27:16 2010 [VMM][D]: Monitor Information:
>>        CPU   : -1
>>        Memory: 512000
>>        Net_TX: 0
>>        Net_RX: 0
>>
>>
>>>
>>> * If the machine has a network lease, can you check directly in the
>>> network to see if the lease has been released?
>> "onevnet show" shows that the network lease has not been released:
>> LEASE=[ IP=192.168.1.202, MAC=00:00:c0:a8:01:ca, USED=1, VID=7929 ]
>> LEASE=[ IP=192.168.1.203, MAC=00:00:c0:a8:01:cb, USED=1, VID=7930 ]
>>
>> Strangely, the lease for VM 7929 is not released either although it is
>> already in the DONE state.
>> I am not sure whether this is a separate bug/problem.
>>
>>>
>>> * Did the VM generate a transfer.prolog and/or transfer.epilog in
>>> $ONE_LOCATION/var/<vmid>/ ?
>> I do have transfer.0.prolog but not the epilog file. The content of
>> transfer.0.prolog is
>> CLONE onefrontend64:/srv/cloud/ImgRep/FC11AOS/Fedora-11-x86_64-AOS-sda.raw
>> node2:/opt/cloud/VM/7930/images/disk.0
>> CONTEXT /srv/cloud/one/var/7930/context.sh
>> /srv/cloud/contextScripts/Redhat/init.sh
>> /srv/cloud/contextScripts/Redhat/ipcontext.sh
>> node2:/opt/cloud/VM/7930/images/disk.1
>> It seems fine to me. The problem is at deletion.
>>
>>>
>>>
>>> * Is the vm.log not showing relevant info or completely empty?
>> The vm.log has no relevant information. The last info written was at
>> Fri Feb 26 15:27:16 2010 (as shown above).
>> After that I restarted the OpenNebula several times and there is
>> nothing written to it ever since.
>>
>>
>>> We hope we'll be able to hunt down the bug with this feedback.
>>>
>>> Regards,
>>>
>>> -Tino
>> Please let me know if there is anything I can do to. Since I know it
>> may be hard for you to reproduce this problem in a controlled manner,
>> I am leaving the system as it is. One suggestion I have is to run the
>> deletion code directly for this VM and see what happens. I could use
>> your help in doing this. Feel free to contact me by IM. My gtalk is
>> jinzishuai at gmail.com and skype is jinzishuai.
>>
>>
>> Shi
>>
>>
>>>
>>> --
>>> Constantino Vázquez, Grid & Virtualization Technology
>>> Engineer/Researcher: http://www.dsa-research.org/tinova
>>> DSA Research Group: http://dsa-research.org
>>> Globus GridWay Metascheduler: http://www.GridWay.org
>>> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>>>
>>>
>>>
>>> On Mon, Mar 1, 2010 at 6:15 PM, Shi Jin <jinzishuai at gmail.com> wrote:
>>>> Hi there,
>>>>
>>>> I am having the same problem here:
>>>> seki at xubuntu:~$ onevm list
>>>>  ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
>>>> 7930 oneadmin  fc11AOS dele   0  512000           node2 02 22:10:56
>>>> ...
>>>> oneadmin at onefrontend64:/srv/cloud/oneadmin/one/var$ onevm show 7930
>>>> VIRTUAL MACHINE 7930 INFORMATION
>>>> ID             : 7930
>>>> NAME           : fc11AOS
>>>> STATE          : ACTIVE
>>>> LCM_STATE      : DELETE
>>>> START TIME     : 02/26 11:34:21
>>>> END TIME       : -
>>>> DEPLOY ID:     : one-7930
>>>> ...
>>>> This paritcular VM cannot be removed by any means.
>>>>
>>>> When executing "onevm delete 7930"
>>>> The only relevant message is from oned.log
>>>> Mon Mar  1 09:59:24 2010 [ReM][D]: VirtualMachineAction invoked
>>>> Mon Mar  1 09:59:24 2010 [DiM][D]: Finalizing VM 7930
>>>> The vm.log for this vm shows nothing.
>>>>
>>>> It looks to me like that the sqlite database for this VM has been frozen.
>>>> But I tried to make a copy of the one.db and was able to change the
>>>> state/lcm_state for the vm_pool table from 3/15 to 6/0.
>>>> Not sure what the problem is.
>>>>
>>>> Shi
>>>> On Thu, Feb 25, 2010 at 4:58 AM, Nikola Garafolic
>>>> <nikola.garafolic at srce.hr> wrote:
>>>>> Vm.log for stuck vms is empty, and oned.log does not look extremely usefull.
>>>>> Vm ids in question are 57 and 58.
>>>>>
>>>>> Regards,
>>>>> Nikola
>>>>>
>>>>> Javier Fontan wrote:
>>>>>>
>>>>>> That's very strange. Can you send me $ONE_LOCATION/var/oned.log and
>>>>>> $ONE_LOCATION/var/<vmid>/vm.log to check what is happening?
>>>>>>
>>>>>> On Thu, Feb 25, 2010 at 12:46 PM, Nikola Garafolic
>>>>>> <nikola.garafolic at srce.hr> wrote:
>>>>>>>
>>>>>>> I am using version 1.4
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nikola
>>>>>>>
>>>>>>> Javier Fontan wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Can you tell us what version of OpenNebula are you using? Version 1.4
>>>>>>>> should be able to delete those stuck VMs.
>>>>>>>>
>>>>>>>> Bye
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 24, 2010 at 2:03 PM, Nikola Garafolic
>>>>>>>> <nikola.garafolic at srce.hr> wrote:
>>>>>>>>>
>>>>>>>>> How can I get rid of zombie VMs that show in onevm? I issued onevm
>>>>>>>>> delete
>>>>>>>>> "vm_id" but, still I can see two VMs, with status dele and they will
>>>>>>>>> not
>>>>>>>>> disappear even after few hours. Stopping and starting one does not
>>>>>>>>> help.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Nikola
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users at lists.opennebula.org
>>>>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at lists.opennebula.org
>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Shi Jin, Ph.D.
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opennebula.org
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>
>>>
>>
>>
>>
>> --
>> Shi Jin, Ph.D.
>>
>

-- 
Shi Jin, Ph.D.