[one-users] how to recover failed vms

Jhon Masschelein Jhon.Masschelein at Sara.Nl
Mon Mar 12 01:13:29 PDT 2012


Hi list,

In you mail, you mix FAILED and UNKNOWN.

When a VM goes to FAILED, it pretty much always means that it was not 
able to deploy due to some error. The log file would give more 
information. Look for things like inaccessible disks or networks, bad 
template variables, etc..

As far as I know, a FAILED VM should never go to READY state without 
resubmission. Please correct me if I am wrong anybody.

UNKNOWN state is different; this happens when oned does not get any 
monitoring info from the VM for a while. This could be a result of the 
system and or libvirt being very busy or maybe network problems.
Once monitoring resumes, this usually result in an UNKNOWN state going 
back to READY. Of course, if for some reason the KVM or XEN domain 
process died, monitoring will never resume.

(Not sure if you are using KVM or XEN, the following is based on KVM but 
I think XEN is relatively similar.)
For example, if you have a node crash, the KVM process will of course 
have died, the monitoring will stop and the VM will end up in UNKNOWN state.

When the crashed node is rebooted, you can "recover" the VM by booting 
it again. In the /var/lib/one/$VMID/images directory for the VM, you 
will find a deployment.X file and the images files. You can simply use 
"virsh create deployment.X" (replace X with the highest number you find 
in the directory). This will restart the VM.

After a little while, opennebula will start receiving monitoring info 
from the restarted VM again and the VM will turn READY.

For a FAILED VM, this mostly is not possible because the reason the VM 
is FAILED is because either the deployment file could not be created, is 
faulty or the disk images could not be copied.

All this is based on my experience with opennebula. Please correct me if 
I am wrong.

Wkr,

Jhon


On 03/11/2012 10:08 PM, Łukasz Oleś wrote:
> On Thursday 08 March 2012 06:45:54 Siva Prasad wrote:
>> Hi All,
>>
>> I have a peculiar issues. For some reason if vm is heavily loaded it
>> goes to unknown state. To recover from unknown state I use "restart".
>> some times the vm gets recovered and sometimes it goes to failed state (
>> in both cases all the vm files exists on the disk).Below are my queries.
>>
>> 1) How to debug why some times vm goes to failed state and why it
>> recovers sometimes
> Check /var/log/one/{vm_id}.log file
>
>> 2) Is there a way to  recover failed vms.
>
> I'm also interested in this question. Anyone?
>
>>
>> Thanks,
>> Siva
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org



More information about the Users mailing list