[one-users] VMs stuck in UNKNOWN State

Ruben S. Montero rsmontero at opennebula.org
Thu Apr 4 14:42:08 PDT 2013


I've been thinking about this and I can't see anypoint where this
information is cached. It is executed and send right away to the core to
process it. In fact you should see the same line "... STATE=a" in the logs.

Cheers


On Wed, Apr 3, 2013 at 4:18 PM, Duverne, Cyrille <
cyrille.duverne at euranova.eu> wrote:

> Ok ok, that's indeed fun :
>
> ruby -wd /var/tmp/one/vmm/kvm/poll one-294
> STATE=a NETTX=19039830 USEDCPU=0.1 USEDMEMORY=1121828 NETRX=416126660
>
> Seems that the polling is correctly working.
> Possible that the state is still on cache or in the DB and not updated or
> something ?
>
> Cheers
> Cyrille
>
>
>
> At Wednesday, 03/04/2013 on 15:15 Ruben S. Montero wrote:
>
> Could you execute the vmm probe in the host
>
> /var/tmp/one/vmm/kvm/poll one-294
>
> and check for errors, or try to debug the script... (maybe running it with
> ruby -wd)
>
> Ruben
>
>
> On Wed, Apr 3, 2013 at 10:42 AM, Duverne, Cyrille <
> cyrille.duverne at euranova.eu> wrote:
>
>> Hello,
>>
>> Indeed, state is still "d" , as you can see here :
>>
>>
>>    1. Wed Apr  3 10:34:13 2013 [VMM][I]: Monitoring VM 294.
>>    2. Wed Apr  3 10:34:13 2013 [VMM][D]: Message received: LOG I 294
>>    ExitCode: 0
>>    3. Wed Apr  3 10:34:13 2013 [VMM][D]: Message received: POLL SUCCESS
>>    294 STATE=d
>>
>>    4.
>>
>> Any thought ?
>> By consciousness, I verified that all users etc... were still correct on
>> all machines, the oneadmin is able to ssh directly etc...
>>
>> Thanks in advance
>> Cyrille
>>
>>
>>
>> At Tuesday, 02/04/2013 on 22:31 Ruben S. Montero wrote:
>>
>> So the VMs are now running, and correctly reported by libvirt, but
>> OpenNebula does not move them from UNKNOWN to RUNNING?, Are the messages
>> still reporting STATE=d for these VMs in oned.log?
>>
>>  Ruben
>>
>>
>> On Tue, Apr 2, 2013 at 3:57 PM, Duverne, Cyrille <
>> cyrille.duverne at euranova.eu> wrote:
>>
>>> Hello,
>>>
>>> Anything new on this ?
>>>
>>> Seems really weird to me...
>>>
>>> Thanks in advance
>>> Cyrille
>>>
>>>
>>>
>>>
>>> At Friday, 29/03/2013 on 10:06 Duverne, Cyrille wrote:
>>>
>>> Hello Ruben !
>>>
>>> Thanks for this feedback.
>>>
>>> I tried to restart libvirt, which succeeded (WOW ! [image: :p])
>>>
>>>
>>> But the VMs are still stuck on Unknown state.
>>>
>>> the 'virsh list' shows correctly the domains, which are running :
>>>
>>> virsh list
>>>  Id Name                 State
>>> ----------------------------------
>>>   1 one-294              running
>>>   2 one-304              running
>>>
>>> Any other thought ? I'm a bit confused by this behaviour and the
>>> workflow to monitor the VMs, it could be interesting to have a 'refresh
>>> monitoring' button or whatever on Sunstone to try to get fresh monitoring
>>> information.
>>>
>>> Thanks in advance
>>> Cyrille
>>>
>>> "Always do right. This will gratify some people and astonish the rest."
>>> Mark Twain
>>>
>>>
>>>
>>> At Thursday, 28/03/2013 on 0:56 Ruben S. Montero wrote:
>>>
>>> Ok
>>>
>>> So this is strange...
>>>
>>> On one hand you try to restart the VM and virsh says it is already
>>> defined (vm.log: main 'one-294' already exists) . And on the other hand
>>> when you monitor the VM virsh list does not show it (oned.log: POLL SUCCESS
>>> 294 STATE=d)
>>>
>>> Is the domain really defined at the host (virsh list)? Can this be a
>>> libvirt issue, any chance to restart libvirt and try again?
>>>
>>>
>>> Cheers
>>>
>>> Ruben
>>>
>>>
>>>
>>> On Tue, Mar 26, 2013 at 10:37 PM, Duverne, Cyrille <
>>> cyrille.duverne at euranova.eu> wrote:
>>>
>>>> Hello Ruben,
>>>>
>>>> Indeed this happens for some of them, but for some others they are
>>>> still in UNKNOWs state.
>>>> Here is an extract of the VM log :
>>>>
>>>> "Thu Mar 21 11:55:56 2013 [LCM][I]: New VM state is SAVE_SUSPEND
>>>>
>>>> Thu Mar 21 11:57:49 2013 [VMM][I]: ExitCode: 0
>>>> Thu Mar 21 11:57:49 2013 [VMM][I]: Successfully execute virtualization driver operation: save.
>>>> Thu Mar 21 11:57:50 2013 [VMM][I]: ExitCode: 0
>>>> Thu Mar 21 11:57:50 2013 [VMM][I]: Successfully execute network driver operation: clean.
>>>> Thu Mar 21 11:57:50 2013 [DiM][I]: New VM state is SUSPENDED
>>>> Tue Mar 26 17:27:48 2013 [DiM][I]: New VM state is ACTIVE.
>>>> Tue Mar 26 17:27:48 2013 [LCM][I]: Restoring VM
>>>> Tue Mar 26 17:27:48 2013 [LCM][I]: New state is BOOT_SUSPENDED
>>>> Tue Mar 26 17:27:49 2013 [VMM][I]: ExitCode: 0
>>>> Tue Mar 26 17:27:49 2013 [VMM][I]: Successfully execute network driver operation: pre.
>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: ExitCode: 0
>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: Successfully execute virtualization driver operation: restore.
>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: ExitCode: 0
>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: Successfully execute network driver operation: post.
>>>> Tue Mar 26 17:28:38 2013 [LCM][I]: New VM state is RUNNING
>>>> Tue Mar 26 17:28:38 2013 [VMM][I]: ExitCode: 0
>>>> Tue Mar 26 17:28:39 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>>> Tue Mar 26 17:28:39 2013 [LCM][I]: New VM state is UNKNOWN
>>>> Tue Mar 26 17:36:48 2013 [LCM][I]: New VM state is BOOT_UNKNOWN
>>>> Tue Mar 26 17:36:48 2013 [VMM][I]: Generating deployment file: /var/lib/one/294/deployment.1
>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: ExitCode: 0
>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Successfully execute network driver operation: pre.
>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /var/lib/one/datastores/0/294/deployment.1 whitefall.local 294 whitefall.local
>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: error: Failed to create domain from /var/lib/one/datastores/0/294/deployment.1
>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: error: operation failed: domain 'one-294' already exists with uuid 326bc42b-1f8a-8984-e610-4c35f0bdd56fTue Mar 26 17:36:52 2013 [VMM][E]: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: ExitCode: 255
>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Failed to execute virtualization driver operation: deploy.Tue Mar 26 17:36:52 2013 [VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>>> Tue Mar 26 17:36:52 2013 [LCM][I]: Fail to boot VM. New VM state is UNKNOWN
>>>> Tue Mar 26 17:37:21 2013 [LCM][I]: New VM state is BOOT_UNKNOWN
>>>> Tue Mar 26 17:37:21 2013 [VMM][I]: Generating deployment file: /var/lib/one/294/deployment.1
>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: ExitCode: 0
>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Successfully execute network driver operation: pre.
>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /var/lib/one/datastores/0/294/deployment.1 whitefall.local 294 whitefall.local
>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: error: Failed to create domain from /var/lib/one/datastores/0/294/deployment.1
>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: error: operation failed: domain 'one-294' already exists with uuid 326bc42b-1f8a-8984-e610-4c35f0bdd56fTue Mar 26 17:37:22 2013 [VMM][E]: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: ExitCode: 255
>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Failed to execute virtualization driver operation: deploy.Tue Mar 26 17:37:22 2013 [VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>>> Tue Mar 26 17:37:23 2013 [LCM][I]: Fail to boot VM. New VM state is UNKNOWN
>>>> Tue Mar 26 17:38:39 2013 [VMM][I]: ExitCode: 0
>>>> Tue Mar 26 17:38:41 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>>> Tue Mar 26 17:48:45 2013 [VMM][I]: ExitCode: 0
>>>> Tue Mar 26 17:48:45 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>>> Tue Mar 26 17:58:45 2013 [VMM][I]: ExitCode: 0
>>>> Tue Mar 26 17:58:45 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>>>
>>>> Tue Mar 26 18:08:45 2013 [VMM][I]: ExitCode: 0"
>>>>
>>>> The RESTART didn't do anything.
>>>>
>>>> Here is the oned.log's extract for the same VM :
>>>>
>>>> "Tue Mar 26 22:18:45 2013 [VMM][I]: Monitoring VM 294.
>>>> Tue Mar 26 22:18:45 2013 [VMM][D]: Message received: LOG I 294
>>>> ExitCode: 0
>>>> Tue Mar 26 22:18:45 2013 [VMM][D]: Message received: POLL SUCCESS 294
>>>> STATE=d"
>>>>
>>>> The VMs that are in UNKNOWN state are located on 2 different hosts.
>>>> All hosts are configurated in the same way.
>>>>
>>>> Thanks in advance
>>>> Cyrille
>>>>
>>>>
>>>> At Tuesday, 26/03/2013 on 18:53 Ruben S. Montero wrote:
>>>>
>>>> They should appear after a while, when the VM is monitored... Look for
>>>> messages Monitoring VM... in oned.log.
>>>>
>>>> Cheers
>>>>
>>>> Ruben
>>>>
>>>>
>>>> On Tue, Mar 26, 2013 at 5:39 PM, Duverne, Cyrille <
>>>> cyrille.duverne at euranova.eu> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I just finished the reboot of our lab after electric shutdown,
>>>>> everything went fine.
>>>>>
>>>>> But some of the VMs are stuck in UNKNOWN state after resuming them.
>>>>> I tried to restart them, but they are actually running on the
>>>>> Hypervisors, it's just that sunstone is displaying UNKNOWN.
>>>>>
>>>>> Any thought to solve this ?
>>>>>
>>>>> Thanks in advance
>>>>> Cyrille
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at lists.opennebula.org
>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ruben S. Montero, PhD
>>>> Project co-Lead and Chief Architect
>>>> OpenNebula - The Open Source Solution for Data Center Virtualization
>>>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>>>
>>>>
>>>
>>>
>>> --
>>> Ruben S. Montero, PhD
>>> Project co-Lead and Chief Architect
>>> OpenNebula - The Open Source Solution for Data Center Virtualization
>>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>>
>>>
>>
>>
>> --
>> Ruben S. Montero, PhD
>> Project co-Lead and Chief Architect
>> OpenNebula - The Open Source Solution for Data Center Virtualization
>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>
>>
>
>
> --
> Ruben S. Montero, PhD
> Project co-Lead and Chief Architect
> OpenNebula - The Open Source Solution for Data Center Virtualization
> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>
>


-- 
Ruben S. Montero, PhD
Project co-Lead and Chief Architect
OpenNebula - The Open Source Solution for Data Center Virtualization
www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130404/cd25da87/attachment-0002.htm>


More information about the Users mailing list