[one-users] Fencing and/or STONITH in case of an host error (ft/host_error.rb)

Sebastian Mangelkramer opennebula at mangelkramer.com
Fri Aug 1 08:48:06 PDT 2014


Hi,

in my OpenNebula environments i used a combination of Pacemaker and Corosync
for monitoring the VMM host of a cluster, where proper checking of
"libvirt" was crucial, to perform fencing and/or STONITH actions in case of
a host failure. OpenNebula / oned triggers a failover of the VMs with the 
HOST_HOOK on ERROR (ft/host_error.rb).

Since several troubles with Corosync/Pacemaker (e.g. monitoring timeout of fencing device (IPMI/ILO-Module))
i decided to implement fencing / STONITH in the host_error.rb-Hook which triggers the failover (--delete --recreate).

I think this is the "right" place for adding those functions?

Therefore i added some attributes to the host templates (ILO_IP, ILO_USER, ILO_PASS - we use HP Servers with iLO-modules):

MONITORING INFORMATION                                                          
ARCH="x86_64"
CPUSPEED="1999"
CPUSPEED="1999"
HOSTNAME="lab-cloud-staging-node-03"
HYPERVISOR="kvm"
HYPERVISOR="kvm"
ILO_IP="IP.IP.IP.IP"
ILO_PASS="USERNAME"
ILO_USER="PASSWORD"
MODELNAME="Intel(R) Xeon(R) CPU           E5335  @ 2.00GHz"
...

To access these attributes i changed the configuration of the hook in oned.conf:

HOST_HOOK = [
    name      = "error",
    on        = "ERROR",
    command   = "/var/lib/one/remotes/hooks/ft/host_error.rb",
    arguments = "$ID $TEMPLATE -d -r",
    remote    = "no" ]





In the next step i modified the host_error.rb-Hook to trigger the STONITH-action in case of an host error.
For that i included "rubyipmi", "base64" and "nokogiri" gem in the hook and added some (primitive, i`m not a programmer :) lines of code:

<start>
# ILO/BMC IP Base = $TEMPLATE
if !(host_template=ARGV[1])
	exit -1
end


host_template_decoded=Base64.decode64(host_template)
xml=Nokogiri::Slop(host_template_decoded)

ilo_ip=xml.HOST.TEMPLATE.ILO_IP.content
ilo_user=xml.HOST.TEMPLATE.ILO_USER.content
ilo_pass=xml.HOST.TEMPLATE.ILO_PASS.content

# Method UID LED activate
def uidled(ilo_ip, ilo_pass, ilo_user)
        conn = Rubyipmi.connect(ilo_user, ilo_pass, ilo_ip, "ipmitool")
	# 86400 Sekunden = 1 Tag
	value = conn.chassis.identify(true, 86400)
        puts value
        sleep (2)
end

# Methode Hard-Reset by iLO/BMC
def stonith(ilo_ip, ilo_pass, ilo_user)
	conn = Rubyipmi.connect(ilo_user, ilo_pass, ilo_ip, "ipmitool")
	value = conn.chassis.power.cycle
	puts value
	sleep (10)
end

# trigger uidled and stonith
uidled(ilo_ip, ilo_pass, ilo_user)
stonith(ilo_ip, ilo_pass, ilo_user)

</stop>

Is this the "right" way to trigger fencing actions with OpenNebula, or are there better ways to implement fencing/STONITH - how do you implement it?
Perhaps virtual machine disk locking (e.g. SANLOCK) could be a solution for some environments?

I think there is (currently) a lack of a proper fencing mechanism in OpenNebula, isn`t it?


Best regards,

Sebastian.







More information about the Users mailing list