[one-users] oned hang

Tino Vazquez tinova at fdi.ucm.es
Mon Jul 19 10:21:30 PDT 2010


Hi Florian, Neil, DuDu,

You are very right, one possible reason for this error message is OpenNEbula
attempting two simultaneous monitoring in the same host. One possible
solution is to increase the HOST_MONITORING_INTERVAL (in our latest
development revision of OpenNebula, we already increased that to 10 minutes,
600 seconds). And, of course, using the snmp driver also proved to be a
great solution for this scalability issue.

Hope it helps,

-Tino

--
Constantino Vázquez Blanco | dsa-research.org/tinova
Virtualization Technology Engineer / Researcher
OpenNebula Toolkit | opennebula.org


On Mon, Jul 19, 2010 at 6:18 PM, Floris Sluiter <Floris.Sluiter at sara.nl>wrote:

>  Hi Dudu, Tino and all,
>
>
>
> We have seen the exact same message (Command execution fail and bad
> interpreter: Text file busy)) on our cluster last week when we expanded it
> from 12 to 16 hosts (with add host)and deploying 10 Vmachines at the same
> time. We did not have multiple instances of opennebula running, we only
> added to a running one,  so it is unlikely that was the issue (the cluster
> was already running stable for a while). We investigated and thought it was
> a timing issue with the monitoring (ssh) driver set to 60 seconds and having
> many hosts and many VMs.
>
> We started using the ssh-monitoring driver again in after the latest update
> to opennebula, before that we used our in hous developed snmp monitoring
> driver.
>
> When we deployed our snmp driver, the error message stopped and for the
> last week we have a stable cloud again, now with 16 hosts…
>
> For people who think see the same timing issues as we did , the snmp_driver
> is available in the ecosystem (but make sure you know what snmp is before
> you try ;-)): http://opennebula.org/software:ecosystem:snmp_im_driver
>
> Regards,
>
>
>
> Floris
>
> HPC project leader
>
> Sara
>
>
>
>
>
> *From:* users-bounces at lists.opennebula.org [mailto:
> users-bounces at lists.opennebula.org] *On Behalf Of *Tino Vazquez
> *Sent:* maandag 19 juli 2010 16:15
> *To:* DuDu
> *Cc:* users at lists.opennebula.org
> *Subject:* Re: [one-users] oned hang
>
>
>
> Dear DuDu,
>
>
>
> This happens when two monitorization actions take place at the same time.
>
>
>
> First thing, which OpenNebula version are you using?
>
>
>
> Are you per chance running two OpenNebula instances? Did you change the
> host polling time?
>
>
>
> Regards,
>
>
>
> -Tino
>
>
> --
> Constantino Vázquez Blanco | dsa-research.org/tinova
> Virtualization Technology Engineer / Researcher
> OpenNebula Toolkit | opennebula.org
>
>  On Wed, Jul 14, 2010 at 3:13 PM, DuDu <blackass at gmail.com> wrote:
>
>
>
> Hi,
>
>
>
> We deployed a small cluster of opennebula, with 8 hosts. It is the default
> opennebula installation, however, we found that after several days of
> running, oned hung. All CLI commands hang too. No new logs generated in
> one_xmlrpc.log. And there are quite some error message like the following in
> oned.log:
>
>
>
> [root at vm-container-31-0 logdir]# tail oned.log
> Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup
> failed: xauth key data not generated
> Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake
> authentication data for X11 forwarding.
> Wed Jul 14 14:51:02 2010 [InM][I]: bash:
> /tmp/one-im//one_im-c4718299a313d89398ea693104dcce5f: /bin/sh: bad
> interpreter: Text file busy
> Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126
> Wed Jul 14 14:51:02 2010 [InM][I]: Command execution fail: 'mkdir -p
> /tmp/one-im/; cat > /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822; if
> [ "x$?" != "x0" ]; then exit -1; fi; chmod +x
> /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822;
> /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822'
> Wed Jul 14 14:51:02 2010 [InM][I]: STDERR follows.
> Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup
> failed: xauth key data not generated
> Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake
> authentication data for X11 forwarding.
> Wed Jul 14 14:51:02 2010 [InM][I]: bash:
> /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822: /bin/sh: bad
> interpreter: Text file busy
> Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126
>
>
>
> We have to sigkill oned and restart it. And that solves all problems.
>
>
>
> Any idea of this?
>
>
>
> Thanks!
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20100719/de29ba1e/attachment-0003.htm>


More information about the Users mailing list