[one-users] onevm - zombie vm

Tino Vazquez tinova at fdi.ucm.es
Tue Mar 2 10:28:48 PST 2010


Hi Shi Jin,

Thanks a lot, we have a few more, since we still cannot figure out the problem:

* Does the host get its capacity (cpu, memory) released?
* Can you send the output of "onevm show -x"?
* And last, but not least, we are fearing a deadlock, so we would like
to ask if the rest of OpenNebula is working as expected (can you still
work with VMs, create them, deploy them, stop them), with no problems,
or there is something stuck (the TM, the IM, ...).

Well, this is very valuable. We appreciate it.

Regards,

-Tino
--
Constantino Vázquez, Grid & Virtualization Technology
Engineer/Researcher: http://www.dsa-research.org/tinova
DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org



On Tue, Mar 2, 2010 at 6:01 PM, Shi Jin <jinzishuai at gmail.com> wrote:
> Hi Tino,
>
> I will do my best to help hunt down the bug. Please see my answers inline.
> Note my zombie VM id 7930.
>
> On Tue, Mar 2, 2010 at 9:28 AM, Tino Vazquez <tinova at fdi.ucm.es> wrote:
>> Hi Shi Jin, Nikola,
>>
>>  This seems like a bug, we will need more info to address the problem.
>> We would appreciate it if you'll be so kind to provide more feedback
>> on:
>>
>> * Which driver are you using (xen,kvm,vmware)?
> I am using KVM.
>
>>
>> * In what state was the VM prior to the delete action?
> I don't recall exactly. But vm.log shows it was RUNNING. Then the
> vm.log just shows repetitive
> Fri Feb 26 15:27:16 2010 [VMM][D]: Monitor Information:
>        CPU   : -1
>        Memory: 512000
>        Net_TX: 0
>        Net_RX: 0
>
>
>>
>> * If the machine has a network lease, can you check directly in the
>> network to see if the lease has been released?
> "onevnet show" shows that the network lease has not been released:
> LEASE=[ IP=192.168.1.202, MAC=00:00:c0:a8:01:ca, USED=1, VID=7929 ]
> LEASE=[ IP=192.168.1.203, MAC=00:00:c0:a8:01:cb, USED=1, VID=7930 ]
>
> Strangely, the lease for VM 7929 is not released either although it is
> already in the DONE state.
> I am not sure whether this is a separate bug/problem.
>
>>
>> * Did the VM generate a transfer.prolog and/or transfer.epilog in
>> $ONE_LOCATION/var/<vmid>/ ?
> I do have transfer.0.prolog but not the epilog file. The content of
> transfer.0.prolog is
> CLONE onefrontend64:/srv/cloud/ImgRep/FC11AOS/Fedora-11-x86_64-AOS-sda.raw
> node2:/opt/cloud/VM/7930/images/disk.0
> CONTEXT /srv/cloud/one/var/7930/context.sh
> /srv/cloud/contextScripts/Redhat/init.sh
> /srv/cloud/contextScripts/Redhat/ipcontext.sh
> node2:/opt/cloud/VM/7930/images/disk.1
> It seems fine to me. The problem is at deletion.
>
>>
>>
>> * Is the vm.log not showing relevant info or completely empty?
> The vm.log has no relevant information. The last info written was at
> Fri Feb 26 15:27:16 2010 (as shown above).
> After that I restarted the OpenNebula several times and there is
> nothing written to it ever since.
>
>
>> We hope we'll be able to hunt down the bug with this feedback.
>>
>> Regards,
>>
>> -Tino
> Please let me know if there is anything I can do to. Since I know it
> may be hard for you to reproduce this problem in a controlled manner,
> I am leaving the system as it is. One suggestion I have is to run the
> deletion code directly for this VM and see what happens. I could use
> your help in doing this. Feel free to contact me by IM. My gtalk is
> jinzishuai at gmail.com and skype is jinzishuai.
>
>
> Shi
>
>
>>
>> --
>> Constantino Vázquez, Grid & Virtualization Technology
>> Engineer/Researcher: http://www.dsa-research.org/tinova
>> DSA Research Group: http://dsa-research.org
>> Globus GridWay Metascheduler: http://www.GridWay.org
>> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>>
>>
>>
>> On Mon, Mar 1, 2010 at 6:15 PM, Shi Jin <jinzishuai at gmail.com> wrote:
>>> Hi there,
>>>
>>> I am having the same problem here:
>>> seki at xubuntu:~$ onevm list
>>>  ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
>>> 7930 oneadmin  fc11AOS dele   0  512000           node2 02 22:10:56
>>> ...
>>> oneadmin at onefrontend64:/srv/cloud/oneadmin/one/var$ onevm show 7930
>>> VIRTUAL MACHINE 7930 INFORMATION
>>> ID             : 7930
>>> NAME           : fc11AOS
>>> STATE          : ACTIVE
>>> LCM_STATE      : DELETE
>>> START TIME     : 02/26 11:34:21
>>> END TIME       : -
>>> DEPLOY ID:     : one-7930
>>> ...
>>> This paritcular VM cannot be removed by any means.
>>>
>>> When executing "onevm delete 7930"
>>> The only relevant message is from oned.log
>>> Mon Mar  1 09:59:24 2010 [ReM][D]: VirtualMachineAction invoked
>>> Mon Mar  1 09:59:24 2010 [DiM][D]: Finalizing VM 7930
>>> The vm.log for this vm shows nothing.
>>>
>>> It looks to me like that the sqlite database for this VM has been frozen.
>>> But I tried to make a copy of the one.db and was able to change the
>>> state/lcm_state for the vm_pool table from 3/15 to 6/0.
>>> Not sure what the problem is.
>>>
>>> Shi
>>> On Thu, Feb 25, 2010 at 4:58 AM, Nikola Garafolic
>>> <nikola.garafolic at srce.hr> wrote:
>>>> Vm.log for stuck vms is empty, and oned.log does not look extremely usefull.
>>>> Vm ids in question are 57 and 58.
>>>>
>>>> Regards,
>>>> Nikola
>>>>
>>>> Javier Fontan wrote:
>>>>>
>>>>> That's very strange. Can you send me $ONE_LOCATION/var/oned.log and
>>>>> $ONE_LOCATION/var/<vmid>/vm.log to check what is happening?
>>>>>
>>>>> On Thu, Feb 25, 2010 at 12:46 PM, Nikola Garafolic
>>>>> <nikola.garafolic at srce.hr> wrote:
>>>>>>
>>>>>> I am using version 1.4
>>>>>>
>>>>>> Regards,
>>>>>> Nikola
>>>>>>
>>>>>> Javier Fontan wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Can you tell us what version of OpenNebula are you using? Version 1.4
>>>>>>> should be able to delete those stuck VMs.
>>>>>>>
>>>>>>> Bye
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 24, 2010 at 2:03 PM, Nikola Garafolic
>>>>>>> <nikola.garafolic at srce.hr> wrote:
>>>>>>>>
>>>>>>>> How can I get rid of zombie VMs that show in onevm? I issued onevm
>>>>>>>> delete
>>>>>>>> "vm_id" but, still I can see two VMs, with status dele and they will
>>>>>>>> not
>>>>>>>> disappear even after few hours. Stopping and starting one does not
>>>>>>>> help.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Nikola
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users at lists.opennebula.org
>>>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opennebula.org
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Shi Jin, Ph.D.
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>
>>
>
>
>
> --
> Shi Jin, Ph.D.
>



More information about the Users mailing list