[one-users] Fencing and/or STONITH in case of an host error (ft/host_error.rb)
Sebastian Mangelkramer
opennebula at mangelkramer.com
Fri Aug 1 08:48:06 PDT 2014
Hi,
in my OpenNebula environments i used a combination of Pacemaker and Corosync
for monitoring the VMM host of a cluster, where proper checking of
"libvirt" was crucial, to perform fencing and/or STONITH actions in case of
a host failure. OpenNebula / oned triggers a failover of the VMs with the
HOST_HOOK on ERROR (ft/host_error.rb).
Since several troubles with Corosync/Pacemaker (e.g. monitoring timeout of fencing device (IPMI/ILO-Module))
i decided to implement fencing / STONITH in the host_error.rb-Hook which triggers the failover (--delete --recreate).
I think this is the "right" place for adding those functions?
Therefore i added some attributes to the host templates (ILO_IP, ILO_USER, ILO_PASS - we use HP Servers with iLO-modules):
MONITORING INFORMATION
ARCH="x86_64"
CPUSPEED="1999"
CPUSPEED="1999"
HOSTNAME="lab-cloud-staging-node-03"
HYPERVISOR="kvm"
HYPERVISOR="kvm"
ILO_IP="IP.IP.IP.IP"
ILO_PASS="USERNAME"
ILO_USER="PASSWORD"
MODELNAME="Intel(R) Xeon(R) CPU E5335 @ 2.00GHz"
...
To access these attributes i changed the configuration of the hook in oned.conf:
HOST_HOOK = [
name = "error",
on = "ERROR",
command = "/var/lib/one/remotes/hooks/ft/host_error.rb",
arguments = "$ID $TEMPLATE -d -r",
remote = "no" ]
In the next step i modified the host_error.rb-Hook to trigger the STONITH-action in case of an host error.
For that i included "rubyipmi", "base64" and "nokogiri" gem in the hook and added some (primitive, i`m not a programmer :) lines of code:
<start>
# ILO/BMC IP Base = $TEMPLATE
if !(host_template=ARGV[1])
exit -1
end
host_template_decoded=Base64.decode64(host_template)
xml=Nokogiri::Slop(host_template_decoded)
ilo_ip=xml.HOST.TEMPLATE.ILO_IP.content
ilo_user=xml.HOST.TEMPLATE.ILO_USER.content
ilo_pass=xml.HOST.TEMPLATE.ILO_PASS.content
# Method UID LED activate
def uidled(ilo_ip, ilo_pass, ilo_user)
conn = Rubyipmi.connect(ilo_user, ilo_pass, ilo_ip, "ipmitool")
# 86400 Sekunden = 1 Tag
value = conn.chassis.identify(true, 86400)
puts value
sleep (2)
end
# Methode Hard-Reset by iLO/BMC
def stonith(ilo_ip, ilo_pass, ilo_user)
conn = Rubyipmi.connect(ilo_user, ilo_pass, ilo_ip, "ipmitool")
value = conn.chassis.power.cycle
puts value
sleep (10)
end
# trigger uidled and stonith
uidled(ilo_ip, ilo_pass, ilo_user)
stonith(ilo_ip, ilo_pass, ilo_user)
</stop>
Is this the "right" way to trigger fencing actions with OpenNebula, or are there better ways to implement fencing/STONITH - how do you implement it?
Perhaps virtual machine disk locking (e.g. SANLOCK) could be a solution for some environments?
I think there is (currently) a lack of a proper fencing mechanism in OpenNebula, isn`t it?
Best regards,
Sebastian.
More information about the Users
mailing list