[one-users] VMs stuck in UNKNOWN State

Carlos Martín Sánchez cmartin at opennebula.org
Fri Apr 12 02:14:38 PDT 2013


Just a wild guess... The 'one-294' argument for the poll script is taken
from VM/DEPLOYMENT_ID. Maybe a bug caused the core to lose that string?

Can you please check that attribute in the onevm show -x output? If is
looks good, edit /var/tmp/one/vmm/kvm/poll and write the arguments
somewhere, just to double check.

Cheers

--
Carlos Martín, MSc
Project Engineer
OpenNebula - The Open-source Solution for Data Center Virtualization
www.OpenNebula.org | cmartin at opennebula.org |
@OpenNebula<http://twitter.com/opennebula><cmartin at opennebula.org>


On Thu, Apr 4, 2013 at 11:42 PM, Ruben S. Montero
<rsmontero at opennebula.org>wrote:

> I've been thinking about this and I can't see anypoint where this
> information is cached. It is executed and send right away to the core to
> process it. In fact you should see the same line "... STATE=a" in the logs.
>
> Cheers
>
>
> On Wed, Apr 3, 2013 at 4:18 PM, Duverne, Cyrille <
> cyrille.duverne at euranova.eu> wrote:
>
>> Ok ok, that's indeed fun :
>>
>> ruby -wd /var/tmp/one/vmm/kvm/poll one-294
>> STATE=a NETTX=19039830 USEDCPU=0.1 USEDMEMORY=1121828 NETRX=416126660
>>
>> Seems that the polling is correctly working.
>> Possible that the state is still on cache or in the DB and not updated or
>> something ?
>>
>> Cheers
>> Cyrille
>>
>>
>>
>> At Wednesday, 03/04/2013 on 15:15 Ruben S. Montero wrote:
>>
>> Could you execute the vmm probe in the host
>>
>> /var/tmp/one/vmm/kvm/poll one-294
>>
>> and check for errors, or try to debug the script... (maybe running it
>> with ruby -wd)
>>
>> Ruben
>>
>>
>> On Wed, Apr 3, 2013 at 10:42 AM, Duverne, Cyrille <
>> cyrille.duverne at euranova.eu> wrote:
>>
>>> Hello,
>>>
>>> Indeed, state is still "d" , as you can see here :
>>>
>>>
>>>    1. Wed Apr  3 10:34:13 2013 [VMM][I]: Monitoring VM 294.
>>>    2. Wed Apr  3 10:34:13 2013 [VMM][D]: Message received: LOG I 294
>>>    ExitCode: 0
>>>    3. Wed Apr  3 10:34:13 2013 [VMM][D]: Message received: POLL SUCCESS
>>>    294 STATE=d
>>>
>>>    4.
>>>
>>> Any thought ?
>>> By consciousness, I verified that all users etc... were still correct on
>>> all machines, the oneadmin is able to ssh directly etc...
>>>
>>> Thanks in advance
>>> Cyrille
>>>
>>>
>>>
>>> At Tuesday, 02/04/2013 on 22:31 Ruben S. Montero wrote:
>>>
>>> So the VMs are now running, and correctly reported by libvirt, but
>>> OpenNebula does not move them from UNKNOWN to RUNNING?, Are the messages
>>> still reporting STATE=d for these VMs in oned.log?
>>>
>>>  Ruben
>>>
>>>
>>> On Tue, Apr 2, 2013 at 3:57 PM, Duverne, Cyrille <
>>> cyrille.duverne at euranova.eu> wrote:
>>>
>>>> Hello,
>>>>
>>>> Anything new on this ?
>>>>
>>>> Seems really weird to me...
>>>>
>>>> Thanks in advance
>>>> Cyrille
>>>>
>>>>
>>>>
>>>>
>>>> At Friday, 29/03/2013 on 10:06 Duverne, Cyrille wrote:
>>>>
>>>> Hello Ruben !
>>>>
>>>> Thanks for this feedback.
>>>>
>>>> I tried to restart libvirt, which succeeded (WOW ! [image: :p])
>>>>
>>>>
>>>> But the VMs are still stuck on Unknown state.
>>>>
>>>> the 'virsh list' shows correctly the domains, which are running :
>>>>
>>>> virsh list
>>>>  Id Name                 State
>>>> ----------------------------------
>>>>   1 one-294              running
>>>>   2 one-304              running
>>>>
>>>> Any other thought ? I'm a bit confused by this behaviour and the
>>>> workflow to monitor the VMs, it could be interesting to have a 'refresh
>>>> monitoring' button or whatever on Sunstone to try to get fresh monitoring
>>>> information.
>>>>
>>>> Thanks in advance
>>>> Cyrille
>>>>
>>>> "Always do right. This will gratify some people and astonish the rest."
>>>> Mark Twain
>>>>
>>>>
>>>>
>>>> At Thursday, 28/03/2013 on 0:56 Ruben S. Montero wrote:
>>>>
>>>> Ok
>>>>
>>>> So this is strange...
>>>>
>>>> On one hand you try to restart the VM and virsh says it is already
>>>> defined (vm.log: main 'one-294' already exists) . And on the other hand
>>>> when you monitor the VM virsh list does not show it (oned.log: POLL SUCCESS
>>>> 294 STATE=d)
>>>>
>>>> Is the domain really defined at the host (virsh list)? Can this be a
>>>> libvirt issue, any chance to restart libvirt and try again?
>>>>
>>>>
>>>> Cheers
>>>>
>>>> Ruben
>>>>
>>>>
>>>>
>>>> On Tue, Mar 26, 2013 at 10:37 PM, Duverne, Cyrille <
>>>> cyrille.duverne at euranova.eu> wrote:
>>>>
>>>>> Hello Ruben,
>>>>>
>>>>> Indeed this happens for some of them, but for some others they are
>>>>> still in UNKNOWs state.
>>>>> Here is an extract of the VM log :
>>>>>
>>>>> "Thu Mar 21 11:55:56 2013 [LCM][I]: New VM state is SAVE_SUSPEND
>>>>>
>>>>> Thu Mar 21 11:57:49 2013 [VMM][I]: ExitCode: 0
>>>>> Thu Mar 21 11:57:49 2013 [VMM][I]: Successfully execute virtualization driver operation: save.
>>>>> Thu Mar 21 11:57:50 2013 [VMM][I]: ExitCode: 0
>>>>> Thu Mar 21 11:57:50 2013 [VMM][I]: Successfully execute network driver operation: clean.
>>>>> Thu Mar 21 11:57:50 2013 [DiM][I]: New VM state is SUSPENDED
>>>>> Tue Mar 26 17:27:48 2013 [DiM][I]: New VM state is ACTIVE.
>>>>> Tue Mar 26 17:27:48 2013 [LCM][I]: Restoring VM
>>>>> Tue Mar 26 17:27:48 2013 [LCM][I]: New state is BOOT_SUSPENDED
>>>>> Tue Mar 26 17:27:49 2013 [VMM][I]: ExitCode: 0
>>>>> Tue Mar 26 17:27:49 2013 [VMM][I]: Successfully execute network driver operation: pre.
>>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: ExitCode: 0
>>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: Successfully execute virtualization driver operation: restore.
>>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: ExitCode: 0
>>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: Successfully execute network driver operation: post.
>>>>> Tue Mar 26 17:28:38 2013 [LCM][I]: New VM state is RUNNING
>>>>> Tue Mar 26 17:28:38 2013 [VMM][I]: ExitCode: 0
>>>>> Tue Mar 26 17:28:39 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>>>> Tue Mar 26 17:28:39 2013 [LCM][I]: New VM state is UNKNOWN
>>>>> Tue Mar 26 17:36:48 2013 [LCM][I]: New VM state is BOOT_UNKNOWN
>>>>> Tue Mar 26 17:36:48 2013 [VMM][I]: Generating deployment file: /var/lib/one/294/deployment.1
>>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: ExitCode: 0
>>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Successfully execute network driver operation: pre.
>>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /var/lib/one/datastores/0/294/deployment.1 whitefall.local 294 whitefall.local
>>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: error: Failed to create domain from /var/lib/one/datastores/0/294/deployment.1
>>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: error: operation failed: domain 'one-294' already exists with uuid 326bc42b-1f8a-8984-e610-4c35f0bdd56fTue Mar 26 17:36:52 2013 [VMM][E]: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: ExitCode: 255
>>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Failed to execute virtualization driver operation: deploy.Tue Mar 26 17:36:52 2013 [VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>>>> Tue Mar 26 17:36:52 2013 [LCM][I]: Fail to boot VM. New VM state is UNKNOWN
>>>>> Tue Mar 26 17:37:21 2013 [LCM][I]: New VM state is BOOT_UNKNOWN
>>>>> Tue Mar 26 17:37:21 2013 [VMM][I]: Generating deployment file: /var/lib/one/294/deployment.1
>>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: ExitCode: 0
>>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Successfully execute network driver operation: pre.
>>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /var/lib/one/datastores/0/294/deployment.1 whitefall.local 294 whitefall.local
>>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: error: Failed to create domain from /var/lib/one/datastores/0/294/deployment.1
>>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: error: operation failed: domain 'one-294' already exists with uuid 326bc42b-1f8a-8984-e610-4c35f0bdd56fTue Mar 26 17:37:22 2013 [VMM][E]: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: ExitCode: 255
>>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Failed to execute virtualization driver operation: deploy.Tue Mar 26 17:37:22 2013 [VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one/datastores/0/294/deployment.1
>>>>> Tue Mar 26 17:37:23 2013 [LCM][I]: Fail to boot VM. New VM state is UNKNOWN
>>>>> Tue Mar 26 17:38:39 2013 [VMM][I]: ExitCode: 0
>>>>> Tue Mar 26 17:38:41 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>>>> Tue Mar 26 17:48:45 2013 [VMM][I]: ExitCode: 0
>>>>> Tue Mar 26 17:48:45 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>>>> Tue Mar 26 17:58:45 2013 [VMM][I]: ExitCode: 0
>>>>> Tue Mar 26 17:58:45 2013 [VMM][I]: VM running but it was not found. Restart and delete actions available or try to recover it manually
>>>>>
>>>>> Tue Mar 26 18:08:45 2013 [VMM][I]: ExitCode: 0"
>>>>>
>>>>> The RESTART didn't do anything.
>>>>>
>>>>> Here is the oned.log's extract for the same VM :
>>>>>
>>>>> "Tue Mar 26 22:18:45 2013 [VMM][I]: Monitoring VM 294.
>>>>> Tue Mar 26 22:18:45 2013 [VMM][D]: Message received: LOG I 294
>>>>> ExitCode: 0
>>>>> Tue Mar 26 22:18:45 2013 [VMM][D]: Message received: POLL SUCCESS 294
>>>>> STATE=d"
>>>>>
>>>>> The VMs that are in UNKNOWN state are located on 2 different hosts.
>>>>> All hosts are configurated in the same way.
>>>>>
>>>>> Thanks in advance
>>>>> Cyrille
>>>>>
>>>>>
>>>>> At Tuesday, 26/03/2013 on 18:53 Ruben S. Montero wrote:
>>>>>
>>>>> They should appear after a while, when the VM is monitored... Look for
>>>>> messages Monitoring VM... in oned.log.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Ruben
>>>>>
>>>>>
>>>>> On Tue, Mar 26, 2013 at 5:39 PM, Duverne, Cyrille <
>>>>> cyrille.duverne at euranova.eu> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I just finished the reboot of our lab after electric shutdown,
>>>>>> everything went fine.
>>>>>>
>>>>>> But some of the VMs are stuck in UNKNOWN state after resuming them.
>>>>>> I tried to restart them, but they are actually running on the
>>>>>> Hypervisors, it's just that sunstone is displaying UNKNOWN.
>>>>>>
>>>>>> Any thought to solve this ?
>>>>>>
>>>>>> Thanks in advance
>>>>>> Cyrille
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.opennebula.org
>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ruben S. Montero, PhD
>>>>> Project co-Lead and Chief Architect
>>>>> OpenNebula - The Open Source Solution for Data Center Virtualization
>>>>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ruben S. Montero, PhD
>>>> Project co-Lead and Chief Architect
>>>> OpenNebula - The Open Source Solution for Data Center Virtualization
>>>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>>>
>>>>
>>>
>>>
>>> --
>>> Ruben S. Montero, PhD
>>> Project co-Lead and Chief Architect
>>> OpenNebula - The Open Source Solution for Data Center Virtualization
>>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>>
>>>
>>
>>
>> --
>> Ruben S. Montero, PhD
>> Project co-Lead and Chief Architect
>> OpenNebula - The Open Source Solution for Data Center Virtualization
>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>
>>
>
>
> --
> Ruben S. Montero, PhD
> Project co-Lead and Chief Architect
> OpenNebula - The Open Source Solution for Data Center Virtualization
> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130412/a23f157e/attachment-0002.htm>


More information about the Users mailing list