[one-users] OpenNebula 4.6.0 monitoring question
Steven Timm
timm at fnal.gov
Mon Jul 28 09:32:07 PDT 2014
I am currently dealing with an unexplained monitoring question
in OpenNebula 4.6 on my development cloud.
I frequently see OpenNebula return that the status of a ONe
host is "ON" even in the case of a system misconfiguration where,
given the credentials, it is impossible for opennebula to
even ssh into the node as oneadmin.
I've fixed all those instances, restarted OpenNebula,
but opennebula still reports a number of VM's
in state "running" even though the node they are running
on was rebooted three days ago and is running no
virtual machines whatsoever.
I think I could be dealing with database corruption of some type
(generated on the one4.4->one4.6 update), or there could
be some problem with the remote scripts on the nodes.
I saw, and I think I fixed, the problems with the database
corruption (namely one of the hosts and one of the datastores
got knocked out of the database for reasons unknown, and I
re-inserted them). But in any case there is some
error handling that is not working in the monitoring
and something is exiting with status 0 that shouldn't be.
ideas? Has anyone else seen something like this?
Steve Timm
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
More information about the Users
mailing list