[one-users] how to recover failed vms
    Carlos Martín Sánchez 
    cmartin at opennebula.org
       
    Thu Mar 15 03:26:01 PDT 2012
    
    
  
Hi Jhon,
Your description is very accurate, and I have just one comment:
Instead of manually executing the 'virsh create' command, you can execute
'onevm restart' [1]. This will have the same effect, the hypervisor
deployment command is executed without going through the prolog state. The
files currently in the host are used, not copied again from the image
repository, preserving the disk changes.
Cheers
[1] http://opennebula.org/documentation:rel3.2:vm_guide_2#onevm_command
--
Carlos Martín, MSc
Project Engineer
OpenNebula - The Open-source Solution for Data Center Virtualization
www.OpenNebula.org | cmartin at opennebula.org |
@OpenNebula<http://twitter.com/opennebula><cmartin at opennebula.org>
2012/3/12 Jhon Masschelein <Jhon.Masschelein at sara.nl>
> Hi list,
>
> In you mail, you mix FAILED and UNKNOWN.
>
> When a VM goes to FAILED, it pretty much always means that it was not able
> to deploy due to some error. The log file would give more information. Look
> for things like inaccessible disks or networks, bad template variables,
> etc..
>
> As far as I know, a FAILED VM should never go to READY state without
> resubmission. Please correct me if I am wrong anybody.
>
> UNKNOWN state is different; this happens when oned does not get any
> monitoring info from the VM for a while. This could be a result of the
> system and or libvirt being very busy or maybe network problems.
> Once monitoring resumes, this usually result in an UNKNOWN state going
> back to READY. Of course, if for some reason the KVM or XEN domain process
> died, monitoring will never resume.
>
> (Not sure if you are using KVM or XEN, the following is based on KVM but I
> think XEN is relatively similar.)
> For example, if you have a node crash, the KVM process will of course have
> died, the monitoring will stop and the VM will end up in UNKNOWN state.
>
> When the crashed node is rebooted, you can "recover" the VM by booting it
> again. In the /var/lib/one/$VMID/images directory for the VM, you will find
> a deployment.X file and the images files. You can simply use "virsh create
> deployment.X" (replace X with the highest number you find in the
> directory). This will restart the VM.
>
> After a little while, opennebula will start receiving monitoring info from
> the restarted VM again and the VM will turn READY.
>
> For a FAILED VM, this mostly is not possible because the reason the VM is
> FAILED is because either the deployment file could not be created, is
> faulty or the disk images could not be copied.
>
> All this is based on my experience with opennebula. Please correct me if I
> am wrong.
>
> Wkr,
>
> Jhon
>
>
>
> On 03/11/2012 10:08 PM, Łukasz Oleś wrote:
>
>> On Thursday 08 March 2012 06:45:54 Siva Prasad wrote:
>>
>>> Hi All,
>>>
>>> I have a peculiar issues. For some reason if vm is heavily loaded it
>>> goes to unknown state. To recover from unknown state I use "restart".
>>> some times the vm gets recovered and sometimes it goes to failed state (
>>> in both cases all the vm files exists on the disk).Below are my queries.
>>>
>>> 1) How to debug why some times vm goes to failed state and why it
>>> recovers sometimes
>>>
>> Check /var/log/one/{vm_id}.log file
>>
>>  2) Is there a way to  recover failed vms.
>>>
>>
>> I'm also interested in this question. Anyone?
>>
>>
>>> Thanks,
>>> Siva
>>> ______________________________**_________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/**listinfo.cgi/users-opennebula.**org<http://lists.opennebula.org/listinfo.cgi/users-opennebula.org>
>>>
>> ______________________________**_________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/**listinfo.cgi/users-opennebula.**org<http://lists.opennebula.org/listinfo.cgi/users-opennebula.org>
>>
> ______________________________**_________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/**listinfo.cgi/users-opennebula.**org<http://lists.opennebula.org/listinfo.cgi/users-opennebula.org>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120315/8c6d1477/attachment-0003.htm>
    
    
More information about the Users
mailing list