[one-users] OpenNebula 4.6.0 monitoring question

Steven Timm timm at fnal.gov
Mon Jul 28 09:32:07 PDT 2014


I am currently dealing with an unexplained monitoring question
in OpenNebula 4.6 on my development cloud.

I frequently see OpenNebula return that the status of a ONe
host is "ON" even in the case of a system misconfiguration where,
given the credentials, it is impossible for opennebula to
even ssh into the node as oneadmin.


I've fixed all those instances, restarted OpenNebula,
but opennebula still reports a number of VM's
in state "running" even though the node they are running
on was rebooted three days ago and is running no
virtual machines whatsoever.

I think I could be dealing with database corruption of some type
(generated on the one4.4->one4.6 update), or there could
be some problem with the remote scripts on the nodes.
I saw, and I think I fixed, the problems with the database
corruption (namely one of the hosts and one of the datastores
got knocked out of the database for reasons unknown, and I
re-inserted them).   But in any case there is some
error handling that is not working in the monitoring
and something is exiting with status 0 that shouldn't be.

ideas?  Has anyone else seen something like this?

Steve Timm



------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing


More information about the Users mailing list