[one-users] VMs stuck in BOOT, probes consumed all processes

Hamada, Ondrej ondrej.hamada at acision.com
Wed Jun 18 06:44:49 PDT 2014


Hi Javier,

Yes, it was libvirt and  restarting it really helped. In the meantime I've encountered similar sort of problem with datastore monitoring - one of the connections to my SAN storage died and the vgdisplay commands issued by the probe were getting stuck, thus each monitoring request got stuck .... According to your description this should be fixed in 4.6 as well, so I'm really looking forward to upgrade.

Thank you

Ondra

-----Original Message-----
From: Javier Fontan [mailto:jfontan at opennebula.org]
Sent: Wednesday, June 18, 2014 3:18 PM
To: Hamada, Ondrej
Cc: users (users at lists.opennebula.org)
Subject: Re: [one-users] VMs stuck in BOOT, probes consumed all processes

When you say KVM monitor is it libvirtd?

OpenNebula probes use libvirt to talk to qemu/kvm so the must be seen with virsh -c qemu:///system command. If this is not the case maybe a libvirtd restart can fix the problem.

In one 4.6 we have changed this a bit. In 4.4 if there was no information about a VM before double the monitoting time (20 seconds I
think) then an explicit poll was sent to the host to check about a single VM, this was using ssh, something like this:

* collectd in the node got stuck
* 40 seconds passed
* an ssh connection plus poll per VM executed, got stuck
* new collectd started with its probes, got stuck again ...

In 4.6 single VM polling is disabled by default. Also there is an special "probe" that kills any wild collectd, that is, one that doesn't have the pid in the pid file. This only lets one collectd with one set of probes running at the same time. Hopefully these changes alleviate the problem.



On Wed, Jun 11, 2014 at 12:33 PM, Hamada, Ondrej <ondrej.hamada at acision.com> wrote:
> Hi everyone,
>
> I’ve recently experienced a problem with ONE 4.4: My VMs were getting
> stuck in BOOT stage. I’ve discovered, that KVM monitor died on one
> node. The VMs were still running, but the probes were unable to get
> any information from them. Thus the monitoring of this host’s VMs
> consumed all processes and it became impossible to deploy the VMs.
>
>
>
> Is this a bug or just my misconfiguration?
>
>
>
> Is there a way to set timeout for the probes? I’ve found
> VM_MONITORING_EXPIRATION_TIME, but according to its descriptions it
> affects only the collected information – not the process of collecting them.
>
>
>
> O.Hamada
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>



--
Javier Fontán Muiños
Developer
OpenNebula - Flexible Enterprise Cloud Made Simple www.OpenNebula.org | @OpenNebula | github.com/jfontan

________________________________
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you for understanding.


More information about the Users mailing list