[one-users] oned hang

Floris Sluiter Floris.Sluiter at sara.nl
Mon Jul 19 09:18:20 PDT 2010


Hi Dudu, Tino and all,

We have seen the exact same message (Command execution fail and bad interpreter: Text file busy)) on our cluster last week when we expanded it from 12 to 16 hosts (with add host)and deploying 10 Vmachines at the same time. We did not have multiple instances of opennebula running, we only added to a running one,  so it is unlikely that was the issue (the cluster was already running stable for a while). We investigated and thought it was a timing issue with the monitoring (ssh) driver set to 60 seconds and having many hosts and many VMs.
We started using the ssh-monitoring driver again in after the latest update to opennebula, before that we used our in hous developed snmp monitoring driver.
When we deployed our snmp driver, the error message stopped and for the last week we have a stable cloud again, now with 16 hosts...

For people who think see the same timing issues as we did , the snmp_driver is available in the ecosystem (but make sure you know what snmp is before you try ;-)): http://opennebula.org/software:ecosystem:snmp_im_driver
Regards,

Floris
HPC project leader
Sara


From: users-bounces at lists.opennebula.org [mailto:users-bounces at lists.opennebula.org] On Behalf Of Tino Vazquez
Sent: maandag 19 juli 2010 16:15
To: DuDu
Cc: users at lists.opennebula.org
Subject: Re: [one-users] oned hang

Dear DuDu,

This happens when two monitorization actions take place at the same time.

First thing, which OpenNebula version are you using?

Are you per chance running two OpenNebula instances? Did you change the host polling time?

Regards,

-Tino

--
Constantino Vázquez Blanco | dsa-research.org/tinova<http://dsa-research.org/tinova>
Virtualization Technology Engineer / Researcher
OpenNebula Toolkit | opennebula.org<http://opennebula.org>

On Wed, Jul 14, 2010 at 3:13 PM, DuDu <blackass at gmail.com<mailto:blackass at gmail.com>> wrote:

Hi,

We deployed a small cluster of opennebula, with 8 hosts. It is the default opennebula installation, however, we found that after several days of running, oned hung. All CLI commands hang too. No new logs generated in one_xmlrpc.log. And there are quite some error message like the following in oned.log:

[root at vm-container-31-0 logdir]# tail oned.log
Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake authentication data for X11 forwarding.
Wed Jul 14 14:51:02 2010 [InM][I]: bash: /tmp/one-im//one_im-c4718299a313d89398ea693104dcce5f: /bin/sh: bad interpreter: Text file busy
Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126
Wed Jul 14 14:51:02 2010 [InM][I]: Command execution fail: 'mkdir -p /tmp/one-im/; cat > /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822; if [ "x$?" != "x0" ]; then exit -1; fi; chmod +x /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822; /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822'
Wed Jul 14 14:51:02 2010 [InM][I]: STDERR follows.
Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake authentication data for X11 forwarding.
Wed Jul 14 14:51:02 2010 [InM][I]: bash: /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822: /bin/sh: bad interpreter: Text file busy
Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126

We have to sigkill oned and restart it. And that solves all problems.

Any idea of this?

Thanks!

_______________________________________________
Users mailing list
Users at lists.opennebula.org<mailto:Users at lists.opennebula.org>
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20100719/3117ace3/attachment-0003.htm>


More information about the Users mailing list