[one-users] Shared File System HA

Ranga Chakravarthula rbabu at hexagrid.com
Wed Mar 14 22:24:37 PDT 2012


There was a post earlier with similar setup details

http://www.mail-archive.com/users@lists.opennebula.org/msg05546.html

About the performance I think their FAQ has some answers
http://www.moosefs.org/moosefs-faq.html

The problem with moosefs is mds is your single point of failure

HTH
Ranga

On Wed, Mar 14, 2012 at 6:34 PM, Marshall Grillos
<mgrillos at optimalpath.com>wrote:

>  Thanks for the update and information.****
>
> ** **
>
> What about the possibility of using a distributed file system (say MooseFS
> and ucarp for HA) and designating each front-end/controller host as a chunk
> server?****
>
> ** **
>
> In that setup, there would only be 2 chunk servers, each with a large
> array attached.  Would the file system remain intact if one chunk server
> failed (given the “goal” value was set at 2 or greater) or do you really
> need additional chunk servers for fault-tolerance to function? ****
>
> ** **
>
> What about the performance of MooseFS across 2 chunk servers with large
> disk arrays (attached as outlined below utilizing 10GbE)?  Does it perform
> well (we will be using SATA drives in a Raid 10 configuration)?****
>
> ** **
>
> Thanks,****
>
> Marshall****
>
> ** **
>
> *From:* Ranga Chakravarthula [mailto:rbabu at hexagrid.com]
> *Sent:* Wednesday, March 14, 2012 4:08 PM
>
> *To:* Marshall Grillos
> *Cc:* users at lists.opennebula.org
> *Subject:* Re: [one-users] Shared File System HA****
>
> ** **
>
> It is plain nfsclient to nfsserver behavior. Hypervisor is just acting as
> NFS client. The OS of the VM is caching the writes in memory and
> periodically writing to Hard disk. During the failover the NFS client will
> continue to try to write but will fail if it cannot connect to the NFS
> server before the timeout happens. If connection is re-established, all the
> writes will go thru.
>
> You need to see the NFS options****
>
> *timeo
> retrans
> retry*****
>
> ** **
>
> On Wed, Mar 14, 2012 at 12:49 PM, Marshall Grillos <
> mgrillos at optimalpath.com> wrote:****
>
> In my design I’m looking at having the shared storage attached to the
> front-end server and provide full redundancy for both the front-end and the
> image repository.  This would then be shared to each compute node via NFS.
> ****
>
>  ****
>
> StorageArray1 ---DAS--->FrontEnd1---10gb Eth---->BladeChassis1****
>
> |****
>
> |****
>
> DRDB/Heartbeat/Pacemaker (between FrontEnd nodes)****
>
> |****
>
> |****
>
> StorageArray2 ---DAS--->FrontEnd2---10gb Eth----> BladeChassis1****
>
>  ****
>
> I planned on setting up an active/passive cluster for two front-end
> servers.  These would have completely separate storage arrays (potentially
> in separate data centers).  Using DRBD (I’m open to other solutions if they
> provide faster failover) the image repository would be mirrored between the
> storage devices.  In the event of any hardware failure
> (NIC/Controller/Power etc) a full failover would occur from Frontend1 to
> Frontend2 propagating the cluster IP address.****
>
>  ****
>
> With this setup, there would be a lag time for the heartbeat/pacemaker to
> detect the failover and the failover to occur (possibly upwards of 30
> seconds).  What will happen to the running VMs when the failover is
> performed?  Is the computing node hypervisor “smart” enough to handle a
> several second NFS outage?****
>
>  ****
>
> I’m definitely open to other solutions GlusterFS etc if they provide a
> smoother failover transition given my existing hardware configuration.****
>
>  ****
>
> Thanks,****
>
> Marshall  ****
>
>  ****
>
> *From:* Ranga Chakravarthula [mailto:rbabu at hexagrid.com]
> *Sent:* Wednesday, March 14, 2012 10:57 AM
> *To:* Marshall Grillos
> *Cc:* users at lists.opennebula.org
> *Subject:* Re: [one-users] Shared File System HA****
>
>  ****
>
> If you are looking at HA at storage level, it would be better you have
> Heartbeat/Failover on the NFS resource than failing over to secondary
> front-end server. Anyway your NFS is mounted on the compute nodes and if
> one storage goes down, heartbeat will failover to another storage. Your
> frontend doesn't have to part of this.****
>
> On Wed, Mar 14, 2012 at 10:26 AM, Marshall Grillos <
> mgrillos at optimalpath.com> wrote:****
>
> I am debating the differences between Shared and Non-shared file systems
> for an OpenNebula deployment.****
>
>  ****
>
> One concern with the shared file system is High Availability.  I am
> setting up the OpenNebula front-end with connectivity to a storage device.
> To avoid the event of a storage device failure (RAID controller, Power,
> etc) I am looking into setting up a secondary front-end server with
> attached storage.  I would use NFS to share the storage to each VM Host and
> setup DRDB for block level replication between each cluster node.  In the
> event of a storage failure, a failover would occur utilizing
> heartbeat/pacemaker to the secondary front-end server.****
>
>  ****
>
> If anyone has tested a similar setup how do the VMs handle the minimal
> outage required for the failover to occur (the several seconds required to
> failover to the secondary front-end)?  For a certain duration, wouldn’t the
> NFS mount be unavailable due to the failover mechanism?****
>
>  ****
>
> Thanks,****
>
> Marshall****
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org****
>
>  ****
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120315/e6e5e228/attachment-0003.htm>


More information about the Users mailing list