[one-users] Highly available ONE?

Ruben S. Montero rubensm at dacya.ucm.es
Thu May 6 14:49:19 PDT 2010


Hi,

After discussing this for a while, we have came up with the an
alternative proposal for the driver communication back-end (instead of
using a message queue or the DB as an spool directory).

Our requirements was to decouple the OpenNebula requests from the
answers from the workernodes, so we could implement a HA solution
without loosing callbacks.

The solution is to use HTTP transport as a backend:

* Simple sw stack without any additional dependencies, so we keep
OpenNebula slim..

* HTTP can easily scale and several architectures can be deployed to
scale the cluster to thousands of physical servers (proxys, load
balancers, hierarchical distribution of HTTP requests...)

* Requests are totally decoupled so you can easily have shadow daemons

* Simple to configure and deploy

Some more info can be found in the following  issue:

http://dev.opennebula.org/issues/237

Any thought on this is more than welcome...

Cheers

Ruben

On Sat, Mar 27, 2010 at 12:38 AM, Ruben S. Montero <rubensm at dacya.ucm.es> wrote:
> Hi
>
> The message Q as an alternate backend for the drivers has been also
> around for a while. As Michael has pointed out: pros are probably
> better scalability and cons is a more complex sw stack. However note
> that the Q system must have the same HA support that we are planning
> for the OpenNebula daemons, i.e. what happens if the message server
> fails? see [1] for example. I don not feel comfortable with the idea
> of using the DB as the communication hub either.
>
> As an alternative, when a shadow takes control in case of a failure of
> the master oned, it can poll VMs in transition states (e.g. booting)
> to find out the real state... This could probably solve the callback
> issue for HA.
>
> We are talking of big changes here. I think we could schedule the DB
> back-end and the support for multiple server URLs for 1.6. This (+
> shared storage & some configuration adjustments ) will give us a sort
> of "soft" HA. Plus other nice features of course.
>
> Thanks for sharing your thoughts...
>
> Cheers
>
> Ruben
>
> [1]http://www.rabbitmq.com/faq.html#failover-ha
>
>
> On Fri, Mar 26, 2010 at 4:35 PM, Michael Fairchild
> <mfairchild at attinteractive.com> wrote:
>> A few ideas/thoughts:
>>
>>  I think the messaging q for callbacks is a great idea.  This not only would allow for a more resilient design, it would potentially ease scaling to a larger cloud.  I'm not sure that the DB is the best thing to use for a q, as opposed to rabbitmq, redis, or similar, but the value of not adding another piece to the stack is not to be underestimated.
>>  An issue with using the DB as the Q, is  that it suggests the hook scripts are writing directly to the DB, which feels wrong to me.  Perhaps the hook scripts could automatically go to the next url after a timeout, fallback style?  If the ONE_URL was an array or list of urls to try we could achieve arbitrary level of decoupled redundancy without requiring central proxy of any sort.  Perhaps one more config variable should be added, ONE_TIMEOUT.
>>  If the ONE_LOCATION were a list of one urls, perhaps they could even be used  roundrobbin style?  In this case the value/need for a q decreases.
>>
>> ~Michael Fairchild
>>
>>
>>
>> On Mar 25, 2010, at 2:39 PM, Keith Hudgins wrote:
>>
>>> I think the shared $ONE_LOCATION in a high-availability setup should
>>> be some type of network share (nfs, etc) from a separate system.
>>> (getting a little hardware geeky here, I apologize) Gluster, zfs, or
>>> similar can be used on top of a drive array for both reliability and
>>> scaling. Even this should have some HA capability. Nexenta can do it
>>> if you're using closed source. I'm not sure of an open source method
>>> of an HA nfs share. I'd be very interested to know of a way.
>>>
>>> Likewise, the scheduler should have some HA capacity. You would have a
>>> master node in which the scheduler was operational, and a slave node
>>> or two which would be promoted to master in the event of failure.
>>>
>>> Callbacks in this case should be handled in a queue-like fashion. The
>>> central database can be used for this type of messaging, to reduce
>>> extra components. The driver messaging should write to the queue or
>>> messaging table. Oned can periodically read from this table and run
>>> callbacks based upon status notices in an asynchronous manner.
>>>
>>> On Thu, Mar 25, 2010 at 4:59 PM, Ruben S. Montero <rubensm at dacya.ucm.es> wrote:
>>>> Hi
>>>>
>>>> I think that the key modification is to abstract the DB engine in the
>>>> OpenNebula core. This has been previously proposed in the list and
>>>> there is now an open issue in the dev portal [1].
>>>>
>>>> In this way, we can have a master oned process and several shadow
>>>> daemons, this daemons can listen on different host/ports so requests
>>>> are sent to the master oned, client tools can fall back to a different
>>>> url when the master oned does does not respond. If you are using OCCI
>>>> or EC2 Interfaces, then the any load balancer or http proxy could
>>>> redirect the connections to one of the shadows in case of a timeout.
>>>>
>>>> In this scenario we would need:
>>>>
>>>> 1. shared $ONE_LOCATION/var among the master and shadow oned's
>>>>
>>>> 2. DB following a client/server model like MySQL
>>>>
>>>> 3. OpenNebula Cloud API needs to handle a list of server URLs
>>>> (OCCI/EC2 interface could just work with an http proxy or load
>>>> balancer like nginx)
>>>>
>>>> There are other issues (I do not have a clear solution for these ones):
>>>>
>>>> 1. Monitoring (host, VMs) should be disabled for the shadows (May be
>>>> the dameons can start in a stand-by mode and switch to fully
>>>> operational after a given number of requests)
>>>>
>>>> 2. Scheduler, same considerations apply for the scheduler. Only one
>>>> scheduler should be operational.
>>>>
>>>> 3. Missing callbacks. If the oned dies we are going to miss any
>>>> pending notification from the drivers  (e.g. if you start a BOOT
>>>> operation and oned crashes the shadow is not going to receive the
>>>> result of that BOOT operation. The VM will stuck in boot state for
>>>> OpenNebula and probably running in the target host)
>>>>
>>>> Cheers
>>>>
>>>> Ruben
>>>>
>>>> [1] http://dev.opennebula.org/issues/206
>>>>
>>>> On Thu, Mar 25, 2010 at 8:30 PM, Claude Noshpitz
>>>> <cnoshpitz at attinteractive.com> wrote:
>>>>> Hello Nebulans,
>>>>>
>>>>> Wondering if anyone has been thinking about how to make ONE highly available
>>>>> by running multiple masters cooperatively.
>>>>>
>>>>> Among other things, this might mean abstracting out the current Sqlite
>>>>> dependencies to rely on a more "distributable" DB solution (one which could
>>>>> itself be independently master-slaved or otherwise made HA).  It seems
>>>>> unlikely that one could finesse HA for Sqlite by e.g. storing its data in a
>>>>> distributed/redundant filesystem (fancy NFS, Gluster) unless there's really
>>>>> strong record locking.
>>>>>
>>>>> Another consideration would involve the consequences of multiple masters
>>>>> sharing a pool of worker nodes -- perhaps this would "just work" if the
>>>>> underlying DB was shared, or not.  There's the question of how to manage the
>>>>> shared VM state in $ONE_LOCATION/var too, but that could be handled with a
>>>>> robust shared filesystem.
>>>>>
>>>>> Ideas?  Opinions?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --Claude
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at lists.opennebula.org
>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Ruben Santiago Montero
>>>> Associate Professor (Profesor Titular), Complutense University of Madrid
>>>>
>>>> URL: http://dsa-research.org/doku.php?id=people:ruben
>>>> Weblog: http://blog.dsa-research.org/?author=7
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opennebula.org
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>
>
>
> --
> Dr. Ruben Santiago Montero
> Associate Professor (Profesor Titular), Complutense University of Madrid
>
> URL: http://dsa-research.org/doku.php?id=people:ruben
> Weblog: http://blog.dsa-research.org/?author=7
>



-- 
Dr. Ruben Santiago Montero
Associate Professor (Profesor Titular), Complutense University of Madrid

URL: http://dsa-research.org/doku.php?id=people:ruben
Weblog: http://blog.dsa-research.org/?author=7



More information about the Users mailing list