[one-users] oned keeps dying without a trace (2.0.1)

Carsten.Friedrich at csiro.au Carsten.Friedrich at csiro.au
Sun May 29 16:39:26 PDT 2011


Found:

kern.log.1:May 27 15:53:21 nebnode01 kernel: [10006493.732909] oned[581]: segfau
lt at 7f02c3048820 ip 00007f02c110174d sp 00007f02c41ce890 error 4 in libc-2.11.
1.so[7f02c108a000+17a000]

(same message in "messages.1")

Carsten Friedrich
Research Team leader
ICT Centre, GPO Box 664,Canberra, ACT 2601
Phone: +61 2 6216 7019 
Email: Carsten.Friedrich at csiro.au
Web:   http://www.csiro.au/org/ICT.html


-----Original Message-----
From: florian.feldhaus at tu-dortmund.de [mailto:florian.feldhaus at tu-dortmund.de] 
Sent: Thursday, 26 May 2011 17:22
To: Friedrich, Carsten (ICT Centre, Acton); users at lists.opennebula.org
Subject: AW: oned keeps dying without a trace (2.0.1)

Hi Carsten,

have you had a look at your system logs (e.g. /var/log/messages)? Are there any suspicious problems? Debugging such a behaviour is really hard and often occurs due to some other problem (e.g. stale NFS, SSH problems, high load due to nightly backup tasks, etc.).

It might help if you try to find the time when OpenNebula dies and search for events in the system logfiles and in the cron jobs (e.g. /etc/crontab /etc/cron.d /etc/cron.daily).

Cheers,
Florian
________________________________________
Von: users-bounces at lists.opennebula.org [users-bounces at lists.opennebula.org]" im Auftrag von "Carsten.Friedrich at csiro.au [Carsten.Friedrich at csiro.au]
Gesendet: Donnerstag, 26. Mai 2011 08:36
Bis: users at lists.opennebula.org
Betreff: [one-users] oned keeps dying without a trace (2.0.1)

I have an installation of OpenNebula 2.0.1 and oned keeps dying every couple of days without a trace in the log files. Any idea what could cause this, or how I can get more information from OpenNebula to narrow down the problem (I currently have debug level 3 in the config file)?

This often happens in the middle of the night, so I don't think it is due to high load or especially demanding requests. After doing a manual 'one stop' and then 'one start' it deletes a stale lock and then works fine again for a couple of days.

Thanks,
Carsten

[cid:387141907 at 30042010-37FB]<http://www.csiro.au/>

Carsten Friedrich
Research Team leader
ICT Centre, GPO Box 664,Canberra, ACT 2601
Phone: +61 2 6216 7019
Email: Carsten.Friedrich at csiro.au<mailto:Carsten.Friedrich at csiro.au>
Web:   http://www.csiro.au/org/ICT.html






More information about the Users mailing list