[one-users] how to recover failed vms
Jhon Masschelein
Jhon.Masschelein at Sara.Nl
Mon Mar 12 01:13:29 PDT 2012
Hi list,
In you mail, you mix FAILED and UNKNOWN.
When a VM goes to FAILED, it pretty much always means that it was not
able to deploy due to some error. The log file would give more
information. Look for things like inaccessible disks or networks, bad
template variables, etc..
As far as I know, a FAILED VM should never go to READY state without
resubmission. Please correct me if I am wrong anybody.
UNKNOWN state is different; this happens when oned does not get any
monitoring info from the VM for a while. This could be a result of the
system and or libvirt being very busy or maybe network problems.
Once monitoring resumes, this usually result in an UNKNOWN state going
back to READY. Of course, if for some reason the KVM or XEN domain
process died, monitoring will never resume.
(Not sure if you are using KVM or XEN, the following is based on KVM but
I think XEN is relatively similar.)
For example, if you have a node crash, the KVM process will of course
have died, the monitoring will stop and the VM will end up in UNKNOWN state.
When the crashed node is rebooted, you can "recover" the VM by booting
it again. In the /var/lib/one/$VMID/images directory for the VM, you
will find a deployment.X file and the images files. You can simply use
"virsh create deployment.X" (replace X with the highest number you find
in the directory). This will restart the VM.
After a little while, opennebula will start receiving monitoring info
from the restarted VM again and the VM will turn READY.
For a FAILED VM, this mostly is not possible because the reason the VM
is FAILED is because either the deployment file could not be created, is
faulty or the disk images could not be copied.
All this is based on my experience with opennebula. Please correct me if
I am wrong.
Wkr,
Jhon
On 03/11/2012 10:08 PM, Łukasz Oleś wrote:
> On Thursday 08 March 2012 06:45:54 Siva Prasad wrote:
>> Hi All,
>>
>> I have a peculiar issues. For some reason if vm is heavily loaded it
>> goes to unknown state. To recover from unknown state I use "restart".
>> some times the vm gets recovered and sometimes it goes to failed state (
>> in both cases all the vm files exists on the disk).Below are my queries.
>>
>> 1) How to debug why some times vm goes to failed state and why it
>> recovers sometimes
> Check /var/log/one/{vm_id}.log file
>
>> 2) Is there a way to recover failed vms.
>
> I'm also interested in this question. Anyone?
>
>>
>> Thanks,
>> Siva
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
More information about the Users
mailing list