[one-users] a new failure mode

Dan Yocum yocum at fnal.gov
Mon Aug 15 14:08:33 PDT 2011


We've encountered a new "failure" mode which we don't know how to 
recover from.  Help!

libvirtd daemon dies on a host node.  oned can't successfully query the 
libvirtd daemon as to the state of the VMs, so all VMs enter "unknown" 
state.  User doesn't realize that libvirtd is the problem, so they 
attempt to 'onevm restart <vmid>' which results in a state of 'fail.' 
Sysadmin comes along and restarts libvirtd on the host machine and all 
VMs are now visible and oned can successfully query the state of the 
VMs, however, oned still thinks that the other VMs are in 'fail' state 
so the user can't restart or stop or anything to the VMs (which are 
still running).

Is there a way to force oned to rescan all VMs, even if they're in a 
'fail' state?


Dan Yocum
Fermilab  630.840.6509
yocum at fnal.gov, http://fermigrid.fnal.gov
"I fly because it releases my mind from the tyranny of petty things."

More information about the Users mailing list