[one-users] onevm - zombie vm

Shi Jin jinzishuai at gmail.com
Tue Mar 2 09:01:16 PST 2010


Hi Tino,

I will do my best to help hunt down the bug. Please see my answers inline.
Note my zombie VM id 7930.

On Tue, Mar 2, 2010 at 9:28 AM, Tino Vazquez <tinova at fdi.ucm.es> wrote:
> Hi Shi Jin, Nikola,
>
>  This seems like a bug, we will need more info to address the problem.
> We would appreciate it if you'll be so kind to provide more feedback
> on:
>
> * Which driver are you using (xen,kvm,vmware)?
I am using KVM.

>
> * In what state was the VM prior to the delete action?
I don't recall exactly. But vm.log shows it was RUNNING. Then the
vm.log just shows repetitive
Fri Feb 26 15:27:16 2010 [VMM][D]: Monitor Information:
	CPU   : -1
	Memory: 512000
	Net_TX: 0
	Net_RX: 0


>
> * If the machine has a network lease, can you check directly in the
> network to see if the lease has been released?
"onevnet show" shows that the network lease has not been released:
LEASE=[ IP=192.168.1.202, MAC=00:00:c0:a8:01:ca, USED=1, VID=7929 ]
LEASE=[ IP=192.168.1.203, MAC=00:00:c0:a8:01:cb, USED=1, VID=7930 ]

Strangely, the lease for VM 7929 is not released either although it is
already in the DONE state.
I am not sure whether this is a separate bug/problem.

>
> * Did the VM generate a transfer.prolog and/or transfer.epilog in
> $ONE_LOCATION/var/<vmid>/ ?
I do have transfer.0.prolog but not the epilog file. The content of
transfer.0.prolog is
CLONE onefrontend64:/srv/cloud/ImgRep/FC11AOS/Fedora-11-x86_64-AOS-sda.raw
node2:/opt/cloud/VM/7930/images/disk.0
CONTEXT /srv/cloud/one/var/7930/context.sh
/srv/cloud/contextScripts/Redhat/init.sh
/srv/cloud/contextScripts/Redhat/ipcontext.sh
node2:/opt/cloud/VM/7930/images/disk.1
It seems fine to me. The problem is at deletion.

>
>
> * Is the vm.log not showing relevant info or completely empty?
The vm.log has no relevant information. The last info written was at
Fri Feb 26 15:27:16 2010 (as shown above).
After that I restarted the OpenNebula several times and there is
nothing written to it ever since.


> We hope we'll be able to hunt down the bug with this feedback.
>
> Regards,
>
> -Tino
Please let me know if there is anything I can do to. Since I know it
may be hard for you to reproduce this problem in a controlled manner,
I am leaving the system as it is. One suggestion I have is to run the
deletion code directly for this VM and see what happens. I could use
your help in doing this. Feel free to contact me by IM. My gtalk is
jinzishuai at gmail.com and skype is jinzishuai.


Shi


>
> --
> Constantino Vázquez, Grid & Virtualization Technology
> Engineer/Researcher: http://www.dsa-research.org/tinova
> DSA Research Group: http://dsa-research.org
> Globus GridWay Metascheduler: http://www.GridWay.org
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Mon, Mar 1, 2010 at 6:15 PM, Shi Jin <jinzishuai at gmail.com> wrote:
>> Hi there,
>>
>> I am having the same problem here:
>> seki at xubuntu:~$ onevm list
>>  ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
>> 7930 oneadmin  fc11AOS dele   0  512000           node2 02 22:10:56
>> ...
>> oneadmin at onefrontend64:/srv/cloud/oneadmin/one/var$ onevm show 7930
>> VIRTUAL MACHINE 7930 INFORMATION
>> ID             : 7930
>> NAME           : fc11AOS
>> STATE          : ACTIVE
>> LCM_STATE      : DELETE
>> START TIME     : 02/26 11:34:21
>> END TIME       : -
>> DEPLOY ID:     : one-7930
>> ...
>> This paritcular VM cannot be removed by any means.
>>
>> When executing "onevm delete 7930"
>> The only relevant message is from oned.log
>> Mon Mar  1 09:59:24 2010 [ReM][D]: VirtualMachineAction invoked
>> Mon Mar  1 09:59:24 2010 [DiM][D]: Finalizing VM 7930
>> The vm.log for this vm shows nothing.
>>
>> It looks to me like that the sqlite database for this VM has been frozen.
>> But I tried to make a copy of the one.db and was able to change the
>> state/lcm_state for the vm_pool table from 3/15 to 6/0.
>> Not sure what the problem is.
>>
>> Shi
>> On Thu, Feb 25, 2010 at 4:58 AM, Nikola Garafolic
>> <nikola.garafolic at srce.hr> wrote:
>>> Vm.log for stuck vms is empty, and oned.log does not look extremely usefull.
>>> Vm ids in question are 57 and 58.
>>>
>>> Regards,
>>> Nikola
>>>
>>> Javier Fontan wrote:
>>>>
>>>> That's very strange. Can you send me $ONE_LOCATION/var/oned.log and
>>>> $ONE_LOCATION/var/<vmid>/vm.log to check what is happening?
>>>>
>>>> On Thu, Feb 25, 2010 at 12:46 PM, Nikola Garafolic
>>>> <nikola.garafolic at srce.hr> wrote:
>>>>>
>>>>> I am using version 1.4
>>>>>
>>>>> Regards,
>>>>> Nikola
>>>>>
>>>>> Javier Fontan wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Can you tell us what version of OpenNebula are you using? Version 1.4
>>>>>> should be able to delete those stuck VMs.
>>>>>>
>>>>>> Bye
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 24, 2010 at 2:03 PM, Nikola Garafolic
>>>>>> <nikola.garafolic at srce.hr> wrote:
>>>>>>>
>>>>>>> How can I get rid of zombie VMs that show in onevm? I issued onevm
>>>>>>> delete
>>>>>>> "vm_id" but, still I can see two VMs, with status dele and they will
>>>>>>> not
>>>>>>> disappear even after few hours. Stopping and starting one does not
>>>>>>> help.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nikola
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at lists.opennebula.org
>>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>
>>>
>>
>>
>>
>> --
>> Shi Jin, Ph.D.
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>



-- 
Shi Jin, Ph.D.



More information about the Users mailing list