[one-users] Shared File System HA

Wed Mar 14 10:49:53 PDT 2012

In my design I'm looking at having the shared storage attached to the front-end server and provide full redundancy for both the front-end and the image repository.  This would then be shared to each compute node via NFS.

StorageArray1 ---DAS--->FrontEnd1---10gb Eth---->BladeChassis1
|
|
DRDB/Heartbeat/Pacemaker (between FrontEnd nodes)
|
|
StorageArray2 ---DAS--->FrontEnd2---10gb Eth----> BladeChassis1

I planned on setting up an active/passive cluster for two front-end servers.  These would have completely separate storage arrays (potentially in separate data centers).  Using DRBD (I'm open to other solutions if they provide faster failover) the image repository would be mirrored between the storage devices.  In the event of any hardware failure (NIC/Controller/Power etc) a full failover would occur from Frontend1 to Frontend2 propagating the cluster IP address.

With this setup, there would be a lag time for the heartbeat/pacemaker to detect the failover and the failover to occur (possibly upwards of 30 seconds).  What will happen to the running VMs when the failover is performed?  Is the computing node hypervisor "smart" enough to handle a several second NFS outage?

I'm definitely open to other solutions GlusterFS etc if they provide a smoother failover transition given my existing hardware configuration.

Thanks,
Marshall

From: Ranga Chakravarthula [mailto:rbabu at hexagrid.com]
Sent: Wednesday, March 14, 2012 10:57 AM
To: Marshall Grillos
Cc: users at lists.opennebula.org
Subject: Re: [one-users] Shared File System HA

If you are looking at HA at storage level, it would be better you have Heartbeat/Failover on the NFS resource than failing over to secondary front-end server. Anyway your NFS is mounted on the compute nodes and if one storage goes down, heartbeat will failover to another storage. Your frontend doesn't have to part of this.
On Wed, Mar 14, 2012 at 10:26 AM, Marshall Grillos <mgrillos at optimalpath.com<mailto:mgrillos at optimalpath.com>> wrote:
I am debating the differences between Shared and Non-shared file systems for an OpenNebula deployment.

One concern with the shared file system is High Availability.  I am setting up the OpenNebula front-end with connectivity to a storage device.  To avoid the event of a storage device failure (RAID controller, Power, etc) I am looking into setting up a secondary front-end server with attached storage.  I would use NFS to share the storage to each VM Host and setup DRDB for block level replication between each cluster node.  In the event of a storage failure, a failover would occur utilizing heartbeat/pacemaker to the secondary front-end server.

If anyone has tested a similar setup how do the VMs handle the minimal outage required for the failover to occur (the several seconds required to failover to the secondary front-end)?  For a certain duration, wouldn't the NFS mount be unavailable due to the failover mechanism?

Thanks,
Marshall

_______________________________________________
Users mailing list
Users at lists.opennebula.org<mailto:Users at lists.opennebula.org>
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120314/4d9cf8fc/attachment-0003.htm>