[one-users] Highly available ONE?

Ruben S. Montero rubensm at dacya.ucm.es
Thu Mar 25 13:59:07 PDT 2010


Hi

I think that the key modification is to abstract the DB engine in the
OpenNebula core. This has been previously proposed in the list and
there is now an open issue in the dev portal [1].

In this way, we can have a master oned process and several shadow
daemons, this daemons can listen on different host/ports so requests
are sent to the master oned, client tools can fall back to a different
url when the master oned does does not respond. If you are using OCCI
or EC2 Interfaces, then the any load balancer or http proxy could
redirect the connections to one of the shadows in case of a timeout.

In this scenario we would need:

1. shared $ONE_LOCATION/var among the master and shadow oned's

2. DB following a client/server model like MySQL

3. OpenNebula Cloud API needs to handle a list of server URLs
(OCCI/EC2 interface could just work with an http proxy or load
balancer like nginx)

There are other issues (I do not have a clear solution for these ones):

1. Monitoring (host, VMs) should be disabled for the shadows (May be
the dameons can start in a stand-by mode and switch to fully
operational after a given number of requests)

2. Scheduler, same considerations apply for the scheduler. Only one
scheduler should be operational.

3. Missing callbacks. If the oned dies we are going to miss any
pending notification from the drivers  (e.g. if you start a BOOT
operation and oned crashes the shadow is not going to receive the
result of that BOOT operation. The VM will stuck in boot state for
OpenNebula and probably running in the target host)

Cheers

Ruben

[1] http://dev.opennebula.org/issues/206

On Thu, Mar 25, 2010 at 8:30 PM, Claude Noshpitz
<cnoshpitz at attinteractive.com> wrote:
> Hello Nebulans,
>
> Wondering if anyone has been thinking about how to make ONE highly available
> by running multiple masters cooperatively.
>
> Among other things, this might mean abstracting out the current Sqlite
> dependencies to rely on a more "distributable" DB solution (one which could
> itself be independently master-slaved or otherwise made HA).  It seems
> unlikely that one could finesse HA for Sqlite by e.g. storing its data in a
> distributed/redundant filesystem (fancy NFS, Gluster) unless there's really
> strong record locking.
>
> Another consideration would involve the consequences of multiple masters
> sharing a pool of worker nodes -- perhaps this would "just work" if the
> underlying DB was shared, or not.  There's the question of how to manage the
> shared VM state in $ONE_LOCATION/var too, but that could be handled with a
> robust shared filesystem.
>
> Ideas?  Opinions?
>
> Thanks!
>
> --Claude
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>



-- 
Dr. Ruben Santiago Montero
Associate Professor (Profesor Titular), Complutense University of Madrid

URL: http://dsa-research.org/doku.php?id=people:ruben
Weblog: http://blog.dsa-research.org/?author=7


More information about the Users mailing list