[one-users] VMs stuck in UNKNOWN State

Ruben S. Montero rsmontero at opennebula.org
Wed Apr 3 06:15:21 PDT 2013


Could you execute the vmm probe in the host

/var/tmp/one/vmm/kvm/poll one-294

and check for errors, or try to debug the script... (maybe running it with
ruby -wd)

Ruben


On Wed, Apr 3, 2013 at 10:42 AM, Duverne, Cyrille <
cyrille.duverne at euranova.eu> wrote:

> Hello,
>
> Indeed, state is still "d" , as you can see here :
>
>
>    1. Wed Apr  3 10:34:13 2013 [VMM][I]: Monitoring VM 294.
>    2. Wed Apr  3 10:34:13 2013 [VMM][D]: Message received: LOG I 294
>    ExitCode: 0
>    3. Wed Apr  3 10:34:13 2013 [VMM][D]: Message received: POLL SUCCESS
>    294 STATE=d
>
>    4.
>
> Any thought ?
> By consciousness, I verified that all users etc... were still correct on
> all machines, the oneadmin is able to ssh directly etc...
>
> Thanks in advance
> Cyrille
>
>
>
> At Tuesday, 02/04/2013 on 22:31 Ruben S. Montero wrote:
>
> So the VMs are now running, and correctly reported by libvirt, but
> OpenNebula does not move them from UNKNOWN to RUNNING?, Are the messages
> still reporting STATE=d for these VMs in oned.log?
>
> Ruben
>
>
> On Tue, Apr 2, 2013 at 3:57 PM, Duverne, Cyrille <
> cyrille.duverne at euranova.eu> wrote:
>
>> Hello,
>>
>> Anything new on this ?
>>
>> Seems really weird to me...
>>
>> Thanks in advance
>> Cyrille
>>
>>
>>
>>
>> At Friday, 29/03/2013 on 10:06 Duverne, Cyrille wrote:
>>
>> Hello Ruben !
>>
>> Thanks for this feedback.
>>
>> I tried to restart libvirt, which succeeded (WOW ! [image: :p])
>>
>>
>> But the VMs are still stuck on Unknown state.
>>
>> the 'virsh list' shows correctly the domains, which are running :
>>
>> virsh list
>>  Id Name                 State
>> ----------------------------------
>>   1 one-294              running
>>   2 one-304              running
>>
>> Any other thought ? I'm a bit confused by this behaviour and the workflow
>> to monitor the VMs, it could be interesting to have a 'refresh monitoring'
>> button or whatever on Sunstone to try to get fresh monitoring information.
>>
>> Thanks in advance
>> Cyrille
>>
>> "Always do right. This will gratify some people and astonish the rest."
>> Mark Twain
>>
>>
>>
>> At Thursday, 28/03/2013 on 0:56 Ruben S. Montero wrote:
>>
>> Ok
>>
>> So this is strange...
>>
>> On one hand you try to restart the VM and virsh says it is already
>> defined (vm.log: main 'one-294' already exists) . And on the other hand
>> when you monitor the VM virsh list does not show it (oned.log: POLL SUCCESS
>> 294 STATE=d)
>>
>> Is the domain really defined at the host (virsh list)? Can this be a
>> libvirt issue, any chance to restart libvirt and try again?
>>
>>
>> Cheers
>>
>> Ruben
>>
>>
>>
>> On Tue, Mar 26, 2013 at 10:37 PM, Duverne, Cyrille <
>> cyrille.duverne at euranova.eu> wrote:
>>
>>> Hello Ruben,
>>>
>>> Indeed this happens for some of them, but for some others they are still
>>> in UNKNOWs state.
>>> Here is an extract of the VM log :
>>>
>>> "Thu Mar 21 11:55:56 2013 [LCM][I]: New VM state is SAVE_SUSPEND
>>>
>>> Thu Mar 21 11:57:49 2013 [VMM][I]: ExitCode: 0
>>> Thu Mar 21 11:57:49 2013 [VMM][I]: Successfully execute virtualization driver operation: save.
>>> Thu Mar 21 11:57:50 2013 [VMM][I]: ExitCode: 0
>>> Thu Mar 21 11:57:50 2013 [VMM][I]: Successfully execute network driver operation: clean.
>>> Thu Mar 21 11:57:50 2013 [DiM][I]: New VM state is SUSPENDED
>>> Tue Mar 26 17:27:48 2013 [DiM][I]: New VM state is ACTIVE.
>>> Tue Mar 26 17:27:48 2013 [LCM][I]: Restoring VM
>>> Tue Mar 26 17:27:48 2013 [LCM][I]: New state is BOOT_SUSPENDED
>>> Tue Mar 26 17:27:49 2013 [VMM][I]: ExitCode: 0
>>> Tue Mar 26 17:27:49 2013 [VMM][I]: Successfully execute network driver operation: pre.
>>> Tue Mar 26 17:28:37 2013 [VMM][I]: ExitCode: 0
>>> Tue Mar 26 17:28:37 2013 [VMM][I]: Successfully execute virtualization driver operation: restore.
>>> Tue Mar 26 17:28:37 2013 [VMM][I]: ExitCode: 0
>>> Tue Mar 26 17:28:37 2013 [VMM][I]: Successfully execute network driver operation: post.
>>> Tue Mar 26 17:28:38 2013 [LCM][I]: New VM state is RUNNING
>>> Tue Mar 26 17:28:38 2013 [VMM][I]: ExitCode: 0
>>> Tue Mar 26 17:28:39 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>> Tue Mar 26 17:28:39 2013 [LCM][I]: New VM state is UNKNOWN
>>> Tue Mar 26 17:36:48 2013 [LCM][I]: New VM state is BOOT_UNKNOWN
>>> Tue Mar 26 17:36:48 2013 [VMM][I]: Generating deployment file: /var/lib/one/294/deployment.1
>>> Tue Mar 26 17:36:52 2013 [VMM][I]: ExitCode: 0
>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Successfully execute network driver operation: pre.
>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /var/lib/one/datastores/0/294/deployment.1 whitefall.local 294 whitefall.local
>>> Tue Mar 26 17:36:52 2013 [VMM][I]: error: Failed to create domain from /var/lib/one/datastores/0/294/deployment.1
>>> Tue Mar 26 17:36:52 2013 [VMM][I]: error: operation failed: domain 'one-294' already exists with uuid 326bc42b-1f8a-8984-e610-4c35f0bdd56fTue Mar 26 17:36:52 2013 [VMM][E]: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>> Tue Mar 26 17:36:52 2013 [VMM][I]: ExitCode: 255
>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Failed to execute virtualization driver operation: deploy.Tue Mar 26 17:36:52 2013 [VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>> Tue Mar 26 17:36:52 2013 [LCM][I]: Fail to boot VM. New VM state is UNKNOWN
>>> Tue Mar 26 17:37:21 2013 [LCM][I]: New VM state is BOOT_UNKNOWN
>>> Tue Mar 26 17:37:21 2013 [VMM][I]: Generating deployment file: /var/lib/one/294/deployment.1
>>> Tue Mar 26 17:37:22 2013 [VMM][I]: ExitCode: 0
>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Successfully execute network driver operation: pre.
>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /var/lib/one/datastores/0/294/deployment.1 whitefall.local 294 whitefall.local
>>> Tue Mar 26 17:37:22 2013 [VMM][I]: error: Failed to create domain from /var/lib/one/datastores/0/294/deployment.1
>>> Tue Mar 26 17:37:22 2013 [VMM][I]: error: operation failed: domain 'one-294' already exists with uuid 326bc42b-1f8a-8984-e610-4c35f0bdd56fTue Mar 26 17:37:22 2013 [VMM][E]: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>> Tue Mar 26 17:37:22 2013 [VMM][I]: ExitCode: 255
>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Failed to execute virtualization driver operation: deploy.Tue Mar 26 17:37:22 2013 [VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>> Tue Mar 26 17:37:23 2013 [LCM][I]: Fail to boot VM. New VM state is UNKNOWN
>>> Tue Mar 26 17:38:39 2013 [VMM][I]: ExitCode: 0
>>> Tue Mar 26 17:38:41 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>> Tue Mar 26 17:48:45 2013 [VMM][I]: ExitCode: 0
>>> Tue Mar 26 17:48:45 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>> Tue Mar 26 17:58:45 2013 [VMM][I]: ExitCode: 0
>>> Tue Mar 26 17:58:45 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>>
>>> Tue Mar 26 18:08:45 2013 [VMM][I]: ExitCode: 0"
>>>
>>> The RESTART didn't do anything.
>>>
>>> Here is the oned.log's extract for the same VM :
>>>
>>> "Tue Mar 26 22:18:45 2013 [VMM][I]: Monitoring VM 294.
>>> Tue Mar 26 22:18:45 2013 [VMM][D]: Message received: LOG I 294 ExitCode:
>>> 0
>>> Tue Mar 26 22:18:45 2013 [VMM][D]: Message received: POLL SUCCESS 294
>>> STATE=d"
>>>
>>> The VMs that are in UNKNOWN state are located on 2 different hosts.
>>> All hosts are configurated in the same way.
>>>
>>> Thanks in advance
>>> Cyrille
>>>
>>>
>>> At Tuesday, 26/03/2013 on 18:53 Ruben S. Montero wrote:
>>>
>>> They should appear after a while, when the VM is monitored... Look for
>>> messages Monitoring VM... in oned.log.
>>>
>>> Cheers
>>>
>>> Ruben
>>>
>>>
>>> On Tue, Mar 26, 2013 at 5:39 PM, Duverne, Cyrille <
>>> cyrille.duverne at euranova.eu> wrote:
>>>
>>>> Hello,
>>>>
>>>> I just finished the reboot of our lab after electric shutdown,
>>>> everything went fine.
>>>>
>>>> But some of the VMs are stuck in UNKNOWN state after resuming them.
>>>> I tried to restart them, but they are actually running on the
>>>> Hypervisors, it's just that sunstone is displaying UNKNOWN.
>>>>
>>>> Any thought to solve this ?
>>>>
>>>> Thanks in advance
>>>> Cyrille
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opennebula.org
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Ruben S. Montero, PhD
>>> Project co-Lead and Chief Architect
>>> OpenNebula - The Open Source Solution for Data Center Virtualization
>>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>>
>>>
>>
>>
>> --
>> Ruben S. Montero, PhD
>> Project co-Lead and Chief Architect
>> OpenNebula - The Open Source Solution for Data Center Virtualization
>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>
>>
>
>
> --
> Ruben S. Montero, PhD
> Project co-Lead and Chief Architect
> OpenNebula - The Open Source Solution for Data Center Virtualization
> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>
>


-- 
Ruben S. Montero, PhD
Project co-Lead and Chief Architect
OpenNebula - The Open Source Solution for Data Center Virtualization
www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130403/435e03d9/attachment-0002.htm>


More information about the Users mailing list