[one-users] Hung sshd processes on VM hosts
Steven Timm
timm at fnal.gov
Fri Mar 14 10:55:15 PDT 2014
We recently deployed several new and bigger hosts on our OpenNebula 3.2
cloud and are seeing some issues. At this point we are not sure if we are
dealing with an OS problem with the sshd or something else.
But the symptom is that we see a OpenNebula monitoring process come into
the VM host as oneadmin, do its thing but then the sshd process
(owned by root) that spawned the process starts using up to 100% of system
cpu, and it is not killable at all. strace of the sshd process simply
hang. Eventually a lot of these build up on the VM host and it is almost
impossible to do anything. Only way to kill them we have found so far
is to restart the parent sshd and then we can kill all the child sshd
processes.
The symptom tends to happen when there are more than 20 virtual machines
on the same host. These are new Ivy-Bridge based hosts that should be
good for at least 40 VM's apiece.
Has anyone seen anything like this before? And yes, I know the 4.x series
of opennebula is a lot more efficient in its monitoring and we are trying
to get there as fast as we can.
Steve Timm
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
More information about the Users
mailing list